Git Product home page Git Product logo

mimiuchi's Introduction

mimiuchi: speech-to-text

mimiuchi is a free, customizable, OSC capable, speech-to-text application for displaying text or relaying it to other applications like VRChat. Its customizable text window is also designed to be paired with applications like OBS. It runs on the web, with little setup required beyond customization. You can try it out right now at mimiuchi.com with Chrome, Safari, or Edge. UI currently supports English and Japanese日本語!

Features

  • Speech-to-text
  • Text-to-speech
  • On-device translations
  • OSC broadcasting (for apps like VRChat)
  • (WIP) Custom OSC param execution via language triggers ("turn my marker on" -> /avatar/parameter/Marker True)
  • ...and many settings to customize the experience!

How to use

Speech-to-Text

Simply go to mimiuchi.com and press the mic button! You will need to grant access the first time you do it. Currently, mimiuchi uses Web Speech API to perform speech-to-text, which is only supported on the web version. You can read more about it below. In the future I will support more options.

Using OSC

Click the broadcast button to toggle OSC. Due to how VRChat OSC works, this will require the desktop app version which you can download here. If you're using speech-to-text, the web version can relay all speech-to-text to the deskop app when broadcasting is on.

Everything together

Running both applications at once, you simply toggle on the MIC and BROADCAST button on the web app. it will then toggle the desktop on with it.

website -> desktop

mimiuchi-ws_example

website -> desktop -> VRChat

mimiuchi-vrchat_example

Additional info

Why?

I support the idea of people having many ways to communicate and do things. It is important to give people those tools and make them easily accessible. This app will give another way for people to display text in different applications like OBS or VRC. It is free and focused on privacy as an end goal. An example of a very similar application is web captioner. However, I want to expand upon it and make this version unique!

Web Speech API

mimiuchi uses Web Speech API to perform speech-to-text, which is a browser dependent API. Most browsers, like Chrome or Edge, will upload your audio to GCP or Azure respectively to have it processed, while the webpage never gets direct access to it. For example, you can read about Chrome's privacy pertaining to it here. I chose Web Speech API because it is completely free and requires no accounts to access. Unfortunately, its free use is disabled in electron's chromium, so this means speech-to-text in this form can only run in the browser. This adds slight complexity when you want to interface with local applications like VRChat by requiring a "middle application" to relay the text back and forth. Still, I think that this approach is worth it as it provides a free way to use powerful speech-to-text models for people who dont have the means to pay.

In the future, I would like to support a standalone desktop experience, but this is currently on hold till I figure out how popular this might be.

Todo

in no particular order...

  • more customization for text window
  • better intermediate text results
  • text-to-speech
  • more TTS/STT options (for standalone desktop experience)
  • VRChat text shader support (sending character data to float params)
  • add ability to export settings/transcripts
  • better webkit/safari support
  • Spotify support(maybe)
  • OBS websocket and 'text source' support
  • option for second 'control panel' type screen with focus on quick switching between settings
  • better generic osc support
  • translation support
  • webhook/websocket customization to connect to other apps that aren't related to me
  • documentation
  • steamvr integration
  • continuous text transmission option for VRChat
  • locally run whisper c++ bindings / WebGPU based inference
    • this point is really important to me, because I want a truly low latency private STT system. but.. I want to make sure I do it the right way, such that it can work entirely in the browser, utilizing the full power of your GPU or CPU, completely local and with minimal latency. A lot of this is very new, so it may take some time to iron it out. the first versions of it may differ greatly from the end goal.

Download

See the release page to install the latest version of the desktop app. The desktop version lets you use additional features like OSC.

Building it yourself

Requirements

Setup

Use npm install to install dependencies.

Use npm run dev to run the application. It will run an electron version and web version.

Or you can use npm run build to build the application. It will create an exe file in release/.

Special Thanks

  • fuopy for the name, mimiuchi, which lends the name from a project they made long ago!

License

This project is licensed under GNU General Public License v3.0 - see the LICENSE.txt file for details.

mimiuchi's People

Contributors

naeruru avatar fuwako avatar jeremio avatar adrianpaniagualeon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.