Comments (10)
I'm ready to show You draft version of requested feature. Surprisingly, it wasn't very difficult to implement. Maybe I missed something 🤔.
You can test it by installing Speech Note from 'flatpak-beta' channel. If you not familiar with 'flatpak-beta', please follow this manual.
This new version comes with following settings options:
Use global keyboard shortcuts
(disabled by default)- You can define here shortcuts for: starting listening and canceling. App will automatically register these shortcuts in the compositor, so make sure that they don't interfere with other, already defined, shortcuts in your system.
- Shortcuts are global. It means they work also when app is in the background (e.g. minimized).
- Shortcut
Listen, text to active window
do the magic 🪄 . When speech is decoded, text is inserted to currently active window. It could be other text editor, search bar in browser, terminal window or anything else
Show desktop notification
- By default, when listening starts and app is in the background, desktop notification is shown to indicate state of processing ('Say something..', 'Processing, wait..' etc).
- When STT model supports 'intermediate results' (Vosk or Coqui/DeepSpeech), notification also contains partially decoded text.
Current limitations:
- Global shortcuts and 'text inserting to active window' work only in X11, so if you are on Wayland it won't work.
- Most likely, 'text inserting to active window' doesn't work for non-latin scripts (e.g. cyrylic, greek, arabic...). I'm working to fix this.
To test new feature:
- start beta version of the app (
flatpak run net.mkiol.SpeechNote//beta
) - enable
Use global keyboard shortcuts
in the settings (change key combination if you don't like default ones) - set the preferable STT model
- open any other window with text field
- press
Ctrl+Alt+Shift+K
(default shortcut forListen, text to active window
) - Say something and wait
- Decoded text should be inserted in the text field where cursor was focused
I would be very grateful for your feedback.
from dsnote.
Updated feature is implemented on "beta" release.
Not everything works well but the "core" functionality is ready. Unfortunately, I wasn't able to convince gedit to cooperate ;-).. but everything else seems to work fine. Configuration is in 'Settings->Accessibility' tab.
You can install and test "beta" version from flathub-beta
channel.
from dsnote.
Thank you for the very interesting idea.
I fully understand the need for this kind of feature.
Actually, the Sailfish OS version of Speech Note (for mobile phone) is integrated into virtual keyboard and the user can, instead of typing, press the "Speech-to-Text" button and insert text into any text field in the system. It is possible because phone platform has software keyboard which can be modified.
I don't know right now how similar feature could be implemented in the Linux desktop. Maybe this can be only done by OS or UI toolkit vendor (e.g. KDE or Qt)... Honestly, I don't know.
I will investigate the possibilities and report here any findings. Stay tuned.
from dsnote.
Thank you for the very interesting idea.
I fully understand the need for this kind of feature.
Actually, the Sailfish OS version of Speech Note (for mobile phone) is integrated into virtual keyboard and the user can, instead of typing, press the "Speech-to-Text" button and insert text into any text field in the system. It is possible because phone platform has software keyboard which can be modified.
I don't know right now how similar feature could be implemented in the Linux desktop. Maybe this can be only done by OS or UI toolkit vendor (e.g. KDE or Qt)... Honestly, I don't know.
I will investigate the possibilities and report here any findings. Stay tuned.
Thank you for the prompt reply, there are similar projects that I've found, whisper-writer by savbell that allows for a ismilar integration but for now it seems way to techhy for a plebian for me to wrap my head around it (i.e I haven't gotten it to work yet)
I will definitely stay tuned, thank you again for the reply!
from dsnote.
This would be great! Just one idea: maybe it runs as a background DE extension (e.g. Gnome extension), while in any app, I select a text input field, I hit a hotkey to enable speech detection in Speech Note background app, then it does STT, and once I stop talking it simulates "paste" to paste it into the selected text field
from dsnote.
Yet another idea is allowing to run in terminal, starting automatically in speech recording mode, and upon stop talking, it returns the text to terminal. This way some one could script it to run and paste its output
from dsnote.
Thanks for all ideas 👍🏿
I'm working on a prototype...
from dsnote.
I tested it in Kitty terminal, firefox browser and gedit in Ubuntu 2204 X11. Its quite good. What I noticed:
- The default hotkey (Ctrl+Alt+Shift+K) worked fine for activating detection in another focused window.
- Works perfectly in Kitty terminal.
- Did not insert any text in a gedit window for some reason.
- Inserted text in a Firefox text entry fields in web pages, and the Firefox URL/search bar BUT there it frequently misses out letters when inserting the text. For example, I said
"Testing Testing One Two Three Four"
, and it always gets it right in Kitty terminal and in the Speech Note GUI, but in Firefox its usually like:
Testing, tsting. Oe, two, three, four.
Testing, esting 1,2,3,4.
Testing, testing, on, tw, three, four.
from dsnote.
More testing:
- Worked perfectly in OnlyOffice
- Did not insert any text in a LibreOffice Writer doc
- Worked perfectly in VS Code, however I needed to change the hotkey so that it did not use Alt because Alt was causing the File menu to focus away from the text document.
from dsnote.
Thank you so much for the tests.
I'm unable to reproduce these problems with missing letters. For some reason, everything works fine in my system. Maybe it is a GNOME or Ubuntu thing. I have no choice but to install Ubuntu now 😉 .
from dsnote.
Related Issues (20)
- Unable to add language second language model on Mint HOT 1
- Support for aprilasr HOT 2
- Missing support for certain file formats HOT 5
- Add open dyslexic font HOT 1
- Save to audio file seemingly not working for large texts HOT 7
- How to get GPU acceleration working? (Debian 12.2, Gnome, Wayland, X11, Nvidia P1000 GPU, Zbook Studio G5) HOT 15
- Should OpenCL work on an Ice Lake v11 intel processor? HOT 9
- '--action' and dbus issue HOT 7
- Drag and drop support HOT 2
- why flatpak app is so big? HOT 15
- Transcribe a file does not work with mounted Google Drive on Gnome HOT 5
- stdout option please HOT 2
- Subtitle output from Whisper models HOT 1
- distil-whisper HOT 2
- Configure audio source
- CUDA does not appear to be working on Fedora with switchable graphics HOT 5
- Use Dbus for Desktop Integration HOT 8
- It is hard to see what the "Download" button corresponds to in the model download dialogue HOT 1
- Stop button smaller than cancel HOT 2
- Limit the number of CPU cores HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsnote.