Git Product home page Git Product logo

echomind's Introduction

EchoMind

Speech analysis script that converts audio into text and phonemes

Temp logo

This readme consists of:

  • Overview of functionality
  • Installation guide
  • Usage guide

Functionality

The script is able to take in audio files and multi-layered folders containing audio files. It then converts the audio files to wave files and segments them into 15* second clips. It analyses these segments and outputs transcripts, ipa data, and phones.

How to install

You will require the following projects/software to run the script

  • Terminal software
  • Python Libraries
  • Allosaurus
  • Vosk
  • Whisper
  • Whisper-timestamped
  • PocketSphinx
  • Espnet
  • Talk bank data

Terminal software

wget

Ubuntu/Debian

sudo apt-get install wget

Windows

To install and configure wget for Windows:

Download wget for [Windows](https://gnuwin32.sourceforge.net/packages/wget.htm "Windows") and install the package.
Add the wget bin path to environment variables (optional). Configuring this removes the need for full paths, and makes it a lot easier to run wget from the command prompt:
    Open the Start menu and search for “environment.”
    Select Edit the system environment variables.
    Select the Advanced tab and click the Environment Variables button.
    Select the Path variable under System Variables.
    Click Edit.
    In the Variable value field add the path to the wget bin directory preceded by a semicolon (;). If installed in the default path, add C: Program Files (x86)GnuWin32bin.
Open the command prompt (cmd.exe) and start running wget commands.

ffmeg

Ubuntu/Debian

sudo apt update && sudo apt install ffmpeg

Windows

On Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg

On Windows using Scoop (https://scoop.sh/) scoop install ffmpeg

Python libraries

Ubuntu/Debian

sudo apt install python3-pip
sudo apt-get install -y python3-pyfiglet
sudo apt-get install python3-termcolor
sudo apt-get install -y python3-pydub
sudo apt-get install python3-moviepy 
sudo apt install python3-numpy
sudo apt-get install libsndfile1
pip3 install soundfile
pip3 install espnet
pip3 install espnet-model-zoo
pip3 install espnet-tts-frontend

Windows

py get-pip.py
pip install pyfiglet
pip install termcolor
pip install pydub
pip install moviepy
pip install numpy 
pip install soundfile
pip install espnet
pip install espnet-model-zoo
pip install espnet-tts-frontend

Allosaurus

To install Allosaurus, use the pip command, or install it from the git reop

pip install allosaurus

Then you will need to install a model for it to run. The script is set to the English model which will require you to download it

python3 -m allosaurus.bin.download_model -m eng2102

However, you can download and use other models if you wish

Vosk

You can install Vosk through the pip command:

pip3 install vosk

Furthermore you will need to download a Model, unpack it, and update the script line

"modelV_path = "/home/parallels/Downloads/vosk-model-en-us-0.22-lgraph" "

, to point to the correct path.

Wishper

To install whisper use the pip command :

pip install -U openai-whisper

whisper-timestamped

Install using the pip command

pip3 install git+https://github.com/linto-ai/whisper-timestamped

lightweight version of Tourch for CPU processing as oppesed to GPU processing

pip3 install \
     torch==1.13.1+cpu \
     torchaudio==0.13.1+cpu \
     -f https://download.pytorch.org/whl/torch_stable.html

Update to the latest version pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/linto-ai/whisper-timestamped

CMU PocketSphinx

To install pocketsphinx, use the pip command:

pip3 install pocketsphinx

Then to get the phone dictotaionary for english, I would recommond cloning this git repo:

https://github.com/cmusphinx/pocketsphinx/tree/c178c8dc1948685ed93c6c4ee93122a7bc789cfd

Then you will want to get the path to model, and update it according in the script. This lets the script access the phone dictionary

Talk Bank

To download a file/folder from talk bank, use the example command, or look at the how to download pdf:

wget -e robots=off -R "index.html*" -N -nH -l inf -r --no-parent https://media.talkbank.org/ca/GulfWar/ - this used to be in the pdf, but then they removed it

The medical cases will require a username and password to access. Prof. Xu has one, but we are working on getting our own.

How to use

After cloning the repo and downloading + installing the required software, you should be good to go. To run the script, place the audio files you wish to anaylise in the InputAudioData under the correct section. The run the script using the command

python speechProcess.py

There is also a verbose mode,

python speechProcess.py --verbose

Testing output

To analyses the accuracy of speech to text output, place tests into testData/input tests should be in the form of :

exampleFolder
---- exampleFolder.audio ---- text.txt (transciption text as a plain text file)

echomind's People

Contributors

t-scholtz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.