Git Product home page Git Product logo

Comments (10)

mkiol avatar mkiol commented on May 18, 2024 2

I'm ready to show You draft version of requested feature. Surprisingly, it wasn't very difficult to implement. Maybe I missed something 🤔.

You can test it by installing Speech Note from 'flatpak-beta' channel. If you not familiar with 'flatpak-beta', please follow this manual.

This new version comes with following settings options:

  • Use global keyboard shortcuts (disabled by default)
    • You can define here shortcuts for: starting listening and canceling. App will automatically register these shortcuts in the compositor, so make sure that they don't interfere with other, already defined, shortcuts in your system.
    • Shortcuts are global. It means they work also when app is in the background (e.g. minimized).
    • Shortcut Listen, text to active window do the magic 🪄 . When speech is decoded, text is inserted to currently active window. It could be other text editor, search bar in browser, terminal window or anything else
  • Show desktop notification
    • By default, when listening starts and app is in the background, desktop notification is shown to indicate state of processing ('Say something..', 'Processing, wait..' etc).
    • When STT model supports 'intermediate results' (Vosk or Coqui/DeepSpeech), notification also contains partially decoded text.

Current limitations:

  • Global shortcuts and 'text inserting to active window' work only in X11, so if you are on Wayland it won't work.
  • Most likely, 'text inserting to active window' doesn't work for non-latin scripts (e.g. cyrylic, greek, arabic...). I'm working to fix this.

To test new feature:

  • start beta version of the app (flatpak run net.mkiol.SpeechNote//beta)
  • enable Use global keyboard shortcuts in the settings (change key combination if you don't like default ones)
  • set the preferable STT model
  • open any other window with text field
  • press Ctrl+Alt+Shift+K (default shortcut for Listen, text to active window)
  • Say something and wait
  • Decoded text should be inserted in the text field where cursor was focused

I would be very grateful for your feedback.

from dsnote.

mkiol avatar mkiol commented on May 18, 2024 2

Updated feature is implemented on "beta" release.

Not everything works well but the "core" functionality is ready. Unfortunately, I wasn't able to convince gedit to cooperate ;-).. but everything else seems to work fine. Configuration is in 'Settings->Accessibility' tab.

You can install and test "beta" version from flathub-beta channel.

from dsnote.

mkiol avatar mkiol commented on May 18, 2024 1

Thank you for the very interesting idea.

I fully understand the need for this kind of feature.

Actually, the Sailfish OS version of Speech Note (for mobile phone) is integrated into virtual keyboard and the user can, instead of typing, press the "Speech-to-Text" button and insert text into any text field in the system. It is possible because phone platform has software keyboard which can be modified.

I don't know right now how similar feature could be implemented in the Linux desktop. Maybe this can be only done by OS or UI toolkit vendor (e.g. KDE or Qt)... Honestly, I don't know.

I will investigate the possibilities and report here any findings. Stay tuned.

from dsnote.

not-the-twiggs-youre-looking-for avatar not-the-twiggs-youre-looking-for commented on May 18, 2024

Thank you for the very interesting idea.

I fully understand the need for this kind of feature.

Actually, the Sailfish OS version of Speech Note (for mobile phone) is integrated into virtual keyboard and the user can, instead of typing, press the "Speech-to-Text" button and insert text into any text field in the system. It is possible because phone platform has software keyboard which can be modified.

I don't know right now how similar feature could be implemented in the Linux desktop. Maybe this can be only done by OS or UI toolkit vendor (e.g. KDE or Qt)... Honestly, I don't know.

I will investigate the possibilities and report here any findings. Stay tuned.

Thank you for the prompt reply, there are similar projects that I've found, whisper-writer by savbell that allows for a ismilar integration but for now it seems way to techhy for a plebian for me to wrap my head around it (i.e I haven't gotten it to work yet)

I will definitely stay tuned, thank you again for the reply!

from dsnote.

vijay-prema avatar vijay-prema commented on May 18, 2024

This would be great! Just one idea: maybe it runs as a background DE extension (e.g. Gnome extension), while in any app, I select a text input field, I hit a hotkey to enable speech detection in Speech Note background app, then it does STT, and once I stop talking it simulates "paste" to paste it into the selected text field

from dsnote.

vijay-prema avatar vijay-prema commented on May 18, 2024

Yet another idea is allowing to run in terminal, starting automatically in speech recording mode, and upon stop talking, it returns the text to terminal. This way some one could script it to run and paste its output

from dsnote.

mkiol avatar mkiol commented on May 18, 2024

Thanks for all ideas 👍🏿

I'm working on a prototype...

from dsnote.

vijay-prema avatar vijay-prema commented on May 18, 2024

I tested it in Kitty terminal, firefox browser and gedit in Ubuntu 2204 X11. Its quite good. What I noticed:

  • The default hotkey (Ctrl+Alt+Shift+K) worked fine for activating detection in another focused window.
  • Works perfectly in Kitty terminal.
  • Did not insert any text in a gedit window for some reason.
  • Inserted text in a Firefox text entry fields in web pages, and the Firefox URL/search bar BUT there it frequently misses out letters when inserting the text. For example, I said "Testing Testing One Two Three Four", and it always gets it right in Kitty terminal and in the Speech Note GUI, but in Firefox its usually like:
Testing, tsting. Oe, two, three, four.
Testing, esting 1,2,3,4.
Testing, testing, on, tw, three, four.

from dsnote.

vijay-prema avatar vijay-prema commented on May 18, 2024

More testing:

  • Worked perfectly in OnlyOffice
  • Did not insert any text in a LibreOffice Writer doc
  • Worked perfectly in VS Code, however I needed to change the hotkey so that it did not use Alt because Alt was causing the File menu to focus away from the text document.

from dsnote.

mkiol avatar mkiol commented on May 18, 2024

Thank you so much for the tests.

I'm unable to reproduce these problems with missing letters. For some reason, everything works fine in my system. Maybe it is a GNOME or Ubuntu thing. I have no choice but to install Ubuntu now 😉 .

from dsnote.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.