Git Product home page Git Product logo

fast-speech-to-text's Introduction

Fast speech to text / Real time multi-lingual voice chat

Repository containing experiment/proof-of-concept for a "real time", multilingual voice chat.

The basic idea is to utilize Web Speech API and Google Cloud Translate API to enable a voice chat application that can translate speech to multiple languages.

Note

The project currently properly runs only in the latest Chrome browser (v117+).

Running things locally

Pre-requisites

If you want to run the server on your own, you will need:

  • set up a Google Cloud project & enable Cloud Translation API
  • set up a Firebase project and enable a Realtime Database

The project is using Application Default Credentials for authentication with Google Cloud Translation API & Firebase.

Before running the server, you will need to authenticate using Coogle Cloud CLI. In order to have server successfully connect to the Firebase Realtime Database, you'll need to impersonate the Service Account that used in GCP.

  • To set up user credentials using Google Cloud CLI, follow these instructions.
  • To impersonate a service account using Google Cloud CLI, follow these instructions. Make sure that your account has the Service Account Token Creator permission in GCP.

Once authenticated successfully, you can run the server with npm run -w server.

Note

This process has to be done ONLY once as the credentials will be generated for you and kept on a "well known" location. For more information see How Application Default Credentials Work

Environment Variables

You will need the following environment variable - use .env files in the server/ directory to set it:

# server/.env

FIREBASE_RTDB_URL=

Launching the application

# install all dependencies for the 'web' and 'server' packages
npm install

# run frontend and backend in dev
npm run dev
# or `npm run -w web` & `npm run -w server` in separate terminals if you so wish

Open localhost:5173 in your browser.

Demo application (NearForm access only)

There is a live demo application, however it is ONLY accessible to people with a NearForm Google account. You can access it here.

fast-speech-to-text's People

Contributors

dependabot[bot] avatar radomird avatar matt-clarson avatar ollyjohn avatar mristic505 avatar synapse avatar dancamma avatar simoneb avatar

Watchers

 avatar  avatar Ross Nelson avatar Pablo Santos avatar Ionuț Florescu Țicleanu avatar Marco Turi avatar Piotr Zimoch avatar Giovanni Ruzzi avatar James Cloos avatar nils stolpe avatar Paul Negrutiu avatar cianomaidin avatar Tiago Relvao avatar Ericsson de Oliveira avatar Danijel Maksimovic avatar Francesco Maida avatar Faiz Ahmed avatar Aman Ramkumar avatar Alfonso Graziano avatar Vikas Bhandari avatar  avatar Filippo Perlini avatar

fast-speech-to-text's Issues

Live translations?

How about we add the ability to translate the user speech to another language, live?

The setup would be the same as we have, in the sense that we capture the audio, but rather than just processing it and echoing it back to the user, we pipe that through one of the classic AI models, which are also capable of translating text to other languages. The challenge here might be in doing the live translation. Similar to this, but a simultaneous translation.

Clearly this would also depend on #1 so that we can speak it back to the user

Convert to React/Next.js

The initial work on this project's UI has been carried out in Svelte but, in order to remain consistent with the other NF apps, we should convert it to React before beginning any other work

Make voice recognition more reliable

When the participant in the chat room hits the "record" button the app should record and recognise the voice until the user presses the "stop recording" button.

The current behaviour is a bit erratic and shows a few issues:

  • while talking with small pauses between words - restarts the voice recognition and everything spoken before is lost
  • the "stop recording" button is enabled while voice recognition is still processing the spoken words, this should be disabled

Update Readme

Update the Readme file to explain how to run the project locally.

Add firebase authentication

As the title implies, we need to setup Firebase authentication for the application and set up rules on the RT DB as well.

Add firebase auth to the server

The server is using Firebase RT DB to store the chatrooms and the events. After setting up rules that only authenticated users can access the DB, server is failing now as it's missing the authentication for the service account.

Live chat with translation

          After playing around with #3 , I realize that in practice this would be quite awkward to do, because speaking while you're getting your message spoken back to you in a different language as you speak, doesn't make for a terribly pleasant conversation. In any case let's work on this under the assumption that the recipient of the translated audio would be a different person. We could consider, as an extension of this work (but not as part of the PR for this work) to set up a simple p2p audio conversation so you could:
  • speak in English to, say, an Italian
  • they would hear Italian voice and talk back in Italian
  • you would hear what they said in English

Obviously, replace Italian with any other language

Originally posted by @simoneb in #2 (comment)

Fix styling and FE bugs

  • When typing a character in the language dropdown, the selection should jump to the first option that starts with this character
  • fix font family
  • Add flags next to user names in the chatroom per design.

Implement live translation

Implement live translation functionality. "Add a voice message" button should be replaced with speech detection toggle (which should be enabled by default).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.