Git Product home page Git Product logo

annotate_audio's Introduction

Annotate audio

These python helper scripts help you to get smaller annotated audio files, from a large audio containing file, to train STT or TTS models, by: 1. split the large file in several smaller wav files, separated by silence. If there are several speaker in your audio, you can also remove the parts spoken by the other(s) speaker(s). 2. (optional) get transcription for these smaller audio files from google cloud STT service, this requires a GCP account 3. manually annotate (or correct GCP annotations) the smaller audio files

Installation

Step 1 requires to have ffmpeg installed on your system.
All the scripts are written in Python 3.6+, required packages can be installed with:

pip install -r requirement.txt

You will need pyaudio for step 3.

Additionnally, if you want to use GCP's STT you should install their python client with

pip install --upgrade google-cloud-speech

and configure a project as shown here.
The current version of this script is compatible with google-cloud-speech 2.X, if you want to use version 1.X, you can have a look at previous versions of this repo which used that version as well.

Usage

python split.py --input big_file.wav --audio_folder audio --out_csv sentences.csv

sentences.csv file will be formated as "file;sentence".
To keep only files spoken by a particular speaker, use the "--remove_bad_segments" and "--speaker_segment" arguments.

python get_gcp_transcription.py --audio_folder audio --csv sentences.csv --language_code en-US
python annotate.py --audio_folder audio --csv sentences.csv

For all three scripts, you can see additional arguments with

python FILE_NAME.py -h

annotate_audio's People

Contributors

natgr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jspaulsen

annotate_audio's Issues

Step 2 Error: recognize takes from 1 to 2 positional arguments

attempting to execute step 2 after configuring gcp, enabling API and billing and installing dependencies listed, get_gcp_transcription.py

Command:
python get_gcp_transcription.py --audio_folder audio --csv sentences.csv --language_code en-US

Error:

files transcribed: 0%| | 0/42 [00:00<?, ?it/s] Traceback (most recent call last): File "get_gcp_transcription.py", line 31, in <module> response = client.recognize(config, audio) TypeError: recognize() takes from 1 to 2 positional arguments but 3 were given

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.