adrianlyjak / obsidian-aloud-tts Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 1.0 1.13 MB

License: MIT License

JavaScript 2.17% TypeScript 95.77% CSS 1.27% Shell 0.60% HTML 0.18%

obsidian-aloud-tts's People

Contributors

Stargazers

Watchers

Forkers

baoliqi

obsidian-aloud-tts's Issues

Feature Request: 1.15 speed

I find 1.0 speed too slow, and 1.25 too fast. Wondering if we could have the option to set the speed manually or have a 1.15 option? Thanks.

Make the audio visualizer nicer

The audio visualization seems like its frequently getting truncated, and overemphasizes low audio ranges.

Adjust it and make it nicer.

Add support for parler-tts

It would be pretty cool to support Parler-TTS, an open source TTS with fairly good quality and voice customization.

If running the model local, the inference code is currently just python. Perhaps there could be a bridge and this would be a node.js only feature.

I don't know how much work it would be to make the model work with javascript+ONNX, but that would also be pretty cool, and likely useful to others to do better on device TTS.

Background audio in mobile

When running the player on a mobile device, such as on an iphone, I want the text player to continue to play while the device is locked, and show an OS level play-pause-track-switcher

audio is regenerated frequently

When I exit the note, and re-enter it, and hit play, the audio sometimes gets regenerated again from scratch. But it also just generally happens when I change my typing cursor position inside the note, and hit play. I'm assuming this has something to do with my cursor position since the plugin always seem inclined to regenerate text it had already previously generated, because it can't seem to consistently remember which already-generated audio file pairs with which sentence, so all it can do is resort to generating new audio.

It also seems that the plugin also generates new audio directly from openai if I increase/decrease the speed. Isn't there a way to just make the audio faster without necessarily having to generate new audio? Like how you would download a video from youtube and play it at faster speeds just fine in VLC media player. This would also probably allow the plugin to play the audio at 3, 4, 5x, which would be really nice for digesting information from a note quickly without having to slowly read.

Honestly, all this comes down to is my wallet crying in agony :). It would also be nice if, in the future, we can utilize local models for notes. Or, at least, have audio be generated once and then imbedded into the note once and for all.

Add support for Eleven Labs

Eleven Labs has a lot of high quality voices and voice configuration. It would be nice to support them as well. Initially this could support just their canned voices, but it could also eventually support custom voices.

Feature Request: Support for Microsoft Azure tts API

Azure tts is the best TTS service I've ever used. You can try this service on this website. Additionally, it supports many different programming languages.

OpenAI's TTS is trained on multiple languages, but compared to specialized TTS models trained on a single language, it still falls short in Chinese, Japanese, and other languages compared to Azure TTS's single-language versions.

May I ask if aloud can provide support for Azure TTS API? Thanks.

Remove extraneous text, such as links

Obsidian notes frequently contain links. its not very helpful to read the links aloud.

Strip links and other non-useful markdown characters before converting text to speech.

Stream larger chunks of audio

The player splits text sentence-by-sentence, and renders those to speech. It does this so that content is streamable and navigable.

However, the speech synthesis isn't as good.

Would it be possible to stream the audio chunks from openai, and somehow rethink the navigation chunks within the paragraph?

[FR] - The ability to be able to permanently mark a piece of text (by wrapping in a callout or similar)

Hi,

I am uncertain if this is an unusual request. However, I would greatly appreciate the functionality to permanently mark a portion of text, which would subsequently generate a small audio icon alongside it, enabling playback.

Whether this could be done in a callout or as part of an embedded code snippet would be up to you (There may be better ways apart from what I illustrate below)

e.g.

> [!tts|1.2 voice]
> Text to play back

```tts
playbackspeed: 1.2
voice: xxx

Text to play back
```

Thanks

Feature Request: bind Pause and Resume hotkeys into one Pause/Resume hotkey

Pause and Resume functionality is very commonly used, but setting Pause and Resume to two different hotkeys is not very reasonable, as it creates a disjointed experience by requiring different keys each time. Most media players bind Pause and Resume to the same hotkey. I hope you can modify this feature.

Thank you very much.

Feature Request: add text highlight customized function

This is a great Obsidian plugin, but the highlight colors for the text being read and the text not yet read are too similar. Could you add an extra configuration option in the settings or an additional configuration JSON file to set the highlight RGBA values?

Thank you very much.

Highlight text correctly, even if its changed

I want to be able to edit text as its speaking to me, and not break the editor highlighting
If the text is ahead of the reader, I want the reader to read the new text
If a text chunk is repeated, I want the editor to correctly highlight the repeated chunk, even if its after the first one

Show my API / cost usage

The API costs money, so I'd like to see cost and usage over time. Might be nice to even set a spending cap

Make audio cache device local

Right now audio files are cached for a short duration to make replaying audio faster. The plugin expires audios after 8 hours. The audio is stored in a .tts directory in the vault. This has the downside of syncing the audio caches. I previously had the caches lasting for much longer, but it was slowing down my iOS syncs significantly.

Is there perhaps instead a good way to store the caches outside the vault so that they don't sync?

Player showing in all tabs

When playing text, the player is only supposed to play in the selected pane that the text is from, but it's showing in all of the panes.

Is it possible to save the audio for later?

I want to convert text to audio and save it in markdown file for later. Is it possible?

Cache duration.

Hello Adrian, the plugin's cache duration of 168 hours is ideal to avoid additional charges for new inferences. I was unable to locate the hidden folder on my Windows system. Although I'm familiar with the process. The primary purpose is for backup. Is there a method to determine which portion of the text has been converted to audio? Any suggestions would be appreciated. Thanks for the plugin.

Feature Request: Skip Markup

I use three kinds of markup in my documents:

Better Bibtex Citekeys (wrapped in square brackets [ @authortitleyear, page ] )
Markdown footnotes (also wrapped in square brackets, but with a caret symbol [^1])
CriticMarkup text for suggestions and comments in markdown (wrapped in curly brackets and angle brackets {>> notes to self <<})

I would like to have Aloud skip text wrapped in these markup code when reading my document, as it would make reading much more fluid without all my notes and citations. Thanks.

Feature Request/Bug Report: Can't play Dataview generated notes

Hey there,

I use Dataview to generate a document of randomly ordered "tidbits" from a folder. But when I highlight the Dataview text, aloud says "No text selected to speak". If it's possible to fix (or if there's a workaround?), that'd be amazing.

Thanks!

Speed control

Something you probably already are planning but can you add a setting to control the speed.

There is a speed option in the Whisper API:
speed: The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

Thanks for the plugin btw was a missing piece in my workflow.

Intermittent issues with no sound on iOS

Hi, thanks for a great plugin! It works great on Mac, but on iOS, there's neither sound, nor controls when I press "Aloud: text to speech".

Since I sync with Working Copy, I can confirm that the sound files are being generated in the .tts directory and can be played from directly from there (the audio content matches the text in the note).

iOS Version: 17.2.1
Model: iPhone 11 Pro Max

Feature Request: Read Note

Right now you have to select next and then run the command. I'd like to have a command that will just directly read the entirety of the current document, without having to first "select-all." Thanks.

Typo mistake

Change playback rate with web-audio API rather than OpenAI parameter

See #28 for original context

OpenAI has a speed option in their API that the plug-in is using. This was just the most straightforward option to implement the feature. I haven't been impressed with the quality of the non-1x audio though. It seems like its lower quality? There's a playbackRate parameter in web audio that this could use (example), however it ends up changing the pitch of the audio as well, so there'd be more involved here than just tuning that parameter. Perhaps the detune parameter could counter-balance the pitch change.

Not working on OSX

Hi, the plugin doesn't work for mac with this error in console:

app.js:1 Uncaught (in promise) Error: Request failed, status 429
    at new t (app.js:1:1977769)
    at qG (app.js:1:1977961)
    at app.js:1:1978638
    at app.js:1:237056
    at Object.next (app.js:1:237161)
    at a (app.js:1:235879)

I have the latest version (0.2.0) and obsidian version 1.5.12

Let me know if you need any additional info.

Help in using opendai with aloud plugin

Need a quick guide to set it up with opendai.speech.

I am trying for 10 hours and opendai.speech is working but couldnt link it to aloud plugin yet..

I tried pasting the url and it gives parlor tts not found error.. while i have other tts (piper) set up...

Configurable toggle for an "always on" player

In mobile, its harder to access and configure commands. Would be nice to have the player always open in mobile