annotate_audio's Introduction

Annotate audio

These python helper scripts help you to get smaller annotated audio files, from a large audio containing file, to train STT or TTS models, by: 1. split the large file in several smaller wav files, separated by silence. If there are several speaker in your audio, you can also remove the parts spoken by the other(s) speaker(s). 2. (optional) get transcription for these smaller audio files from google cloud STT service, this requires a GCP account 3. manually annotate (or correct GCP annotations) the smaller audio files

Installation

Step 1 requires to have ffmpeg installed on your system.
All the scripts are written in Python 3.6+, required packages can be installed with:

pip install -r requirement.txt

You will need pyaudio for step 3.

Additionnally, if you want to use GCP's STT you should install their python client with

pip install --upgrade google-cloud-speech

and configure a project as shown here.
The current version of this script is compatible with google-cloud-speech 2.X, if you want to use version 1.X, you can have a look at previous versions of this repo which used that version as well.

Usage

python split.py --input big_file.wav --audio_folder audio --out_csv sentences.csv

sentences.csv file will be formated as "file;sentence".
To keep only files spoken by a particular speaker, use the "--remove_bad_segments" and "--speaker_segment" arguments.

python get_gcp_transcription.py --audio_folder audio --csv sentences.csv --language_code en-US

python annotate.py --audio_folder audio --csv sentences.csv

For all three scripts, you can see additional arguments with

python FILE_NAME.py -h

annotate_audio's People

Contributors

Stargazers

Watchers

annotate_audio's Issues

Step 2 Error: recognize takes from 1 to 2 positional arguments

attempting to execute step 2 after configuring gcp, enabling API and billing and installing dependencies listed, get_gcp_transcription.py

Command:
python get_gcp_transcription.py --audio_folder audio --csv sentences.csv --language_code en-US

Error:

files transcribed: 0%| | 0/42 [00:00<?, ?it/s] Traceback (most recent call last): File "get_gcp_transcription.py", line 31, in <module> response = client.recognize(config, audio) TypeError: recognize() takes from 1 to 2 positional arguments but 3 were given

Recommend Projects

natgr / annotate_audio Goto Github PK

annotate_audio's Introduction

Annotate audio

Installation

Usage

annotate_audio's People

Contributors

Stargazers

Watchers

Forkers

annotate_audio's Issues

Step 2 Error: recognize takes from 1 to 2 positional arguments

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent