Git Product home page Git Product logo

Comments (10)

h3ndrik avatar h3ndrik commented on August 24, 2024 1

I'm a fan of your work. I've been using Rhasspy and Romkabouter's ESP32 Satellite before. Nothing properly productive, mainly tinkering around. I'm currently digging into how this one works.

My general thoughts are: Wow boy is there much work to do on the microcontroller side... Silence detection and VAD only kinda work with the ADF (which unfortunately isn't open source, which isn't great at all) and the media_player component and some others don't work with the ESP-IDF requirement. openWakeWord works but it tears down the pipeline and fires on_start and on_end every few seconds. And I'm still debugging stuff so my harddisk gets filled with recordings of silence and random stuff. I'd like this to be easier for someone who is new to the stuff. But... I even managed to train my own wake-word. That's awesome.

Perspectively I'd like something like all the cool signal processing that's available in the big voice assistants. Being able to play music and subtract the output from the microphone so we can simultaneously listen to music and instruct it to stop. Have microphone arrays and far-field voice control, beam-forming and speaker recognition available. I suppose you've at one point seen how the Amazon bugging devices work, the signal processing really adds to the real-world usability. But we're still missing the absolute basics here. (I always preferred projects like ESP8266Audio to the ESP-ADF because it's free software. But there is no signal processing available and it's mainly for outputting sound.)

Whisper is a bit slow on my old server. And I really liked the idea of constraining the STT to the predefined sentences. (I'm currently porting the stuff from my Rhasspy Add-on config. But I still struggle with esphome, instead.) It immediately makes it blazing fast and does away with problems like a preposition not being transcribed correctly. For a wider audience it would be great if the sentences came from what HA is able to understand (automatically). But it doesn't seem like this was our main concern at this point. (And I've played with VOSK before. It's really easy to write a few shell scripts or small python scripts to integrate it into your own small projects. I had tied it into an Asterisk telephony server at some point.)

I think I'm going to file more bugreports once I get to dig into the VOSK addon. Currently the in/out replace doesn't work for me. It always gives me the fixed sentence back but without the replacement being done. And I'd need that for words which aren't in the keywords file (like the 'loo mo ss' example) only with german composite words that get blanks inserted inbetween.

My main use-case would be a voice assistant for the kitchen that can play music, set timers, tell you a joke and announce the weather in the morning, the delay on public transport, birthdays and appointments of the day. And add things to the shopping list. I take it for granted that I can also turn on and off some lights in the house. I'd scatter around a few more ESP32s to announce things in other rooms and play music, once it becomes useful.) And the last thing, I'm fooling around with LLMs (Artificial intelligence). An AI agent could give the house a proper personality and be tied into HA to control everything like a ship-computer on Star Trek does. That's maybe something to consider after the year after the Year of the Voice.

from hassio-addons.

h3ndrik avatar h3ndrik commented on August 24, 2024 1

fix the in/out replace issue

Thank you very much. Can confirm it works and I've closed that issue.

Espressif [...] keep deprecating boards by the time I get something working on them

Hehe. I still have some older ESP32 boards (not by espressif) in my drawer. Mainly because I like to start hobby projects and don't finish them. But sometimes I pull out something like the old TAudio board which I'm currrently testing this on.

[signal processing] I don't know that we'll ever get there, honestly. Maybe if the big players give up fully on voice and sell their tech to someone willing to make chips that the rest of us can use.

Sadly I don't know much about signal processing. I've searched the internet for libraries and algorithms for noise suppression, echo cancellation and voice stuff. Seems there isn't anything good available to tinkerers like me. Mostly companies selling their proprietary solutions and DSPs. I'd like to get some microphone array board, but it would need to come with the signal processing already implemented. (And in a way that allows me to poke around.)
Esphome just supports the basics regarding audio. I'd like to see more implemented there. I've opened PR esphome/esphome#5613 to hopefully learn a bit and have a place to start.

[...] next year we'll be able to run a local LLM

Things are still moving crazy fast. I run Home Assistant on an (old) server, so I'm not that constrained like someone with an single board computer would be. The server doesn't have a GPU but I can run llama.cpp in a different virtual machine and I'm willing to connect it to the smart home at some point. I'm aware of llama.cpp's feature to constrain it to some grammar like outputting JSON.

In my opinion smaller models like Mistral 7B are surprisingly capable and still fast on a regular computer. And it knows a lot of things. Probably enough to be able to interact with me. I think with models in the size of Microsoft's phi-1 (but tuned for this use-case) we could have it run on a single board computer.

I'm still not completely sold on the idea of having LLMs and smart assistants in my life. They're nice, but on the other hand I can already do lots of stuff the way it is.

from hassio-addons.

synesthesiam avatar synesthesiam commented on August 24, 2024

You may need to quote [der|die|das] in the YAML. It's probably interpreting it as a list.

expansion_rules:
  artikel: "[der|die|das]"

from hassio-addons.

h3ndrik avatar h3ndrik commented on August 24, 2024

Ah, nice. Works now. Thank you very much. You might want to add that in the example in "vosk/DOCS.md" here and in the "README.md" of wyoming-vosk

I've tried it exactly like it's written there and that also didn't work.
I'm going to leave this issue open in case you want to update the documentation. But it's solved for me, feel free to close this issue.

from hassio-addons.

synesthesiam avatar synesthesiam commented on August 24, 2024

Thanks! I'll update the example and the docs.

Any thoughts on the add-on itself? Can you share what your use case is maybe? I haven't promoted it at all yet. I'm thinking of making a tutorial video.

from hassio-addons.

synesthesiam avatar synesthesiam commented on August 24, 2024

Thanks for the feedback @h3ndrik! I've updated the Vosk add-on to (hopefully) fix the in/out replace issue.

My general thoughts are: Wow boy is there much work to do on the microcontroller side

Agreed. Hardware is so varied and moving so fast that it's hard to make progress. With Espressif especially, they keep deprecating boards by the time I get something working on them 😄

Perspectively I'd like something like all the cool signal processing that's available in the big voice assistants.

I got an Echo Dot for testing and wow, it can hear you through just about anything. I don't know that we'll ever get there, honestly. Maybe if the big players give up fully on voice and sell their tech to someone willing to make chips that the rest of us can use.

For a wider audience it would be great if the sentences came from what HA is able to understand (automatically).

This is the plan, actually. I need an API on the Home Assistant side to get the entities and areas that have been exposed to Assist. With that, I can just plug those lists into the default intents and generate the possible sentences.

I'm fooling around with LLMs (Artificial intelligence).

They're getting faster and faster, so I'm hopeful that next year we'll be able to run a local LLM and use it with Home Assistant. I'm seeing more experiments where they constrain the LLM to produce JSON, for example. That would let you interface it to HA much more easily, and still produce interesting responses (inside the JSON).

Thanks again for testing and following my work!

from hassio-addons.

h3ndrik avatar h3ndrik commented on August 24, 2024

Wow, the expansion rules expand fast. I've added the HA intent sentences to turn on and off devices, lights and set brightness and color. With optional articles, prepositions and areas. (to the Vosk sentences)

Now it says "Loading /share/vosk/sentences/de.yaml" for a minute and then the Vosk Addon kills the async event handler ;-)

It stopped displaying the list when it got to a 4 or 5 digit length... Both limiting sentences and correcting them doesn't deal with that amount.

I don't know enough about Vosk to make any recommendations here. But it seems ingesting that sentences file at runtime doesn't scale anywhere close to real-world usage.

I've turned back to Faster-Whisper but it always gets most of it right, but one character or word wrong. ("Schalte das Wohnzimmerlicht ein" -> "Schalte das Wohnzimmer nicht ein" ("Don't turn on the livingroom")) Meh.

from hassio-addons.

synesthesiam avatar synesthesiam commented on August 24, 2024

Can you post the YAML here so I can benchmark it?

from hassio-addons.

synesthesiam avatar synesthesiam commented on August 24, 2024

Update: I've switched to using an sqlite database to store the sentences, and only giving vosk the available words. On a Raspberry Pi 4, it only takes 1.34 seconds to generate 22,786 sentences, and 0.01 seconds to load the recognizer.

from hassio-addons.

h3ndrik avatar h3ndrik commented on August 24, 2024

Well, I can still make it hang for a few minutes if I try something like the following (setting brigness in percent). After that some async worker will generate an error message but at least it seems to generate the sqlite database for the next pipeline run.

sentences:
# light_HassLightSet
  - "<setzen> [<artikel>] Helligkeit von <name> auf {brightness} [Prozent] [ein]"
  - "[<artikel>] Helligkeit von <name> auf {brightness} [Prozent] <setzen>"
  - "dimme [[<artikel>] Helligkeit [von|vom] [<artikel>]] <name> [auf|zu] {brightness} [Prozent]"
  - "<name> [auf|zu] {brightness} [Prozent] dimmen"
#  - in: "dimme <name>""
#    out: "Setze Helligkeit von <name> auf 25"
lists:
  device:
    values:
      - in: fernseher
        out: Wohnzimmer TV
      - in: licht
        out: Deckenlicht Wohnzimmer
      - in: wohnzimmer licht
        out: Wohnzimmerlicht
      - in: deko licht
        out: Dekolicht
      - in: flur licht
        out: Flurlicht
      - in: licht am esstisch
        out: Esstischbeleuchtung
      - in: küchen beleuchtung
        out: Küchenbeleuchtung
      - in: licht in der küche
        out: Küchenbeleuchtung
  brightness:
    values:
      - in: ein
        out: 1
      - in: eins
        out: 1
      - in: fünf
        out: 5
      - in: zehn
        out: 10
      - in: fünfzehn
        out: 15
      - in: zwanzig
        out: 20
      - in: fünfundzwanzig
        out: 25
      - in: dreißig
        out: 30
      - in: vierzig
        out: 40
      - in: fünfzig
        out: 50
      - in: sechzig
        out: 60
      - in: siebzig
        out: 70
      - in: fünfundsiebzig
        out: 75
      - in: achtzig
        out: 80
      - in: fünfundachzig
        out: 85
      - in: neunzig
        out: 90
      - in: fünfundneunzig
        out: 95
      - in: neunundneunzig
        out: 99
      - in: hundert
        out: 100
  color:
    values:
      - in: "wei(ß|ss)"
        out: "white"
      - in: "schwarz"
        out: "black"
      - in: "rot"
        out: "red"
      - in: "orange"
        out: "orange"
      - in: "gelb"
        out: "yellow"
      - in: "grün"
        out: "green"
      - in: "blau"
        out: "blue"
      - in: "violett"
        out: "purple"
      - in: "lila"
        out: "purple"
      - in: "braun"
        out: "brown"

expansion_rules:
  artikel_bestimmt: "(der|die|das|dem|der|den|des)"
  artikel_unbestimmt: "(ein|eine|eines|einer|einem|einen)"
  artikel: "(<artikel_bestimmt>|<artikel_unbestimmt>)"
  name: "[<artikel>] {device}"
  setzen: "(setz[e|en]|stell[e|en]|einstellen|änder[e|n]|veränder[e|n])"
  licht: "[<artikel>] (Licht|Lampe|Beleuchtung)"
  brightness: "{brightness} [Prozent]"

from hassio-addons.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.