Git Product home page Git Product logo

whatsapp_voice_transcription's Introduction

WhatsApp Voice Note Transcription

This Node.js application aids in transcribing voice notes sent via WhatsApp. The app can transcribe voice notes to text using either OpenAI's Whisper API or Deepgram's API. It also provides a summary and action steps using either OpenAI's GPT models or Anthropic's Claude models. The transcription and summary are then sent back to the user's WhatsApp account.

Features

  • Transcribes WhatsApp voice notes to text using OpenAI's Whisper API or Deepgram's API
  • Generates a summary and action steps from the transcription using OpenAI's GPT models or Anthropic's Claude models
  • Sends the transcription, summary, and action steps back to the user via WhatsApp
  • Supports configuration through environment variables to choose the AI service provider, voice transcription service, and specific models to use

Warning

WARNING: This uses third-party services (OpenAI, Deepgram, Anthropic) to transcribe the file and generate summaries. Be aware of the risks of automatically sending your audio files outside your encrypted messages to third-party services. In addition, this is using an unofficial WhatsApp library which may break at any time or result in your account being banned by Meta.

Getting Started

These instructions will get you a copy of the project up and running on your local machine. To get started, follow these steps:

Prerequisites

Before you start, make sure you have the following installed:

  • Node.js
  • npm
  • WhatsApp account
  • API Key from OpenAI for GPT and Whisper APIs (if using OpenAI)
  • API Key from Anthropic for Claude models (if using Anthropic)
  • API Key from Deepgram for voice transcription (if using Deepgram)

Installing

  1. Clone the repository to your local machine:

    git clone https://github.com/nerveband/whatsapp_voice_transcription.git
    
  2. Navigate into the project directory:

    cd whatsapp_voice_transcription
    
  3. Install the dependencies using npm:

    npm install
    
  4. Create a .env file by copying the example file:

    cp .env.example .env
    
  5. Set the necessary environment variables in the .env file.

  6. Run the application:

    npm start
    
  7. Use the WhatsApp Web interface to scan the QR code generated in the console to log in.

Usage

After following the installation steps above, simply send a voice note to your own WhatsApp account to test it. The app will then transcribe the voice note to text using the selected voice transcription service (OpenAI or Deepgram). It will also generate a summary and action steps using the selected AI service (OpenAI or Anthropic). The transcription will be sent back to you on WhatsApp as one message with the summary (if enabled) as message right after.

Built With

Authors

  • Ashraf Ali With lots of help from GPT-4 and Claude 3 Opus :)

License

This project is MIT Licensed. See the LICENSE file for details.

whatsapp_voice_transcription's People

Contributors

nerveband avatar

Stargazers

PGRjoystick avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.