Git Product home page Git Product logo

speechaugment.jl's Introduction

SpeechAugment

SpeechAugment

Motivation

AI algorithms are mostly data-driven, and the quality of the data determines the quality of the model to some extent. This leads to the inherent shortcoming of deep learning, and data augmentation is an effective way to solve this problem.

Methods

This repo supports audio data augmentations such as :

  • reverberation
  • background noise
  • distortion
  • packet loss simulation
  • farfield effect
  • speed perturbation

After those time domain augmentations, one can apply feature extraction step.

Installation

To install the released stable version, enter the REPL mode

] add SpeechAugment

or

Pkg.add("SpeechAugment")

To install the development version, enter the REPL mode

] add https://github.com/sonosole/SpeechAugment.jl.git

Example

using WAV
using SpeechAugment

# 1. read a wav file as a speech example
batchsize = 8;
data,fs = wavread("/XXPath/ASpeechExample.wav");

# 2. init all the augmentation functions you want
echo  = initAddEcho(fs, (0.05,0.4), (3.0,3.2,2.5,3.5,2.0,3.0));
noise = initAddNoise("XXPathFullOfNoiseWAVs", 2, (5,15));
clip  = initClipWav((0.5,2.0));
drop  = initDropWav(fs, (0.09,0.15));
far   = initFarfieldWav(fs, (0.4,0.9));
speed = initSpeedWav((0.8,1.2));

# 3. make a function list or array
fnlist = [echo noise clip drop far speed];

# 4. augment #batchSize audios
wavs = Vector(undef, batchsize)
for i = 1:batchsize
    wavs[i] = copy(data)
end
wavs = augmentWavs(fnlist, wavs)
for i = 1:batchsize
    wavwrite(wavs[i], "A$i.wav",Fs=16000,nbits=32)
end

# there is also a function called `augmentWav`
# it augments one audio into multiple audios.
audios = augmentWav(fnlist, data, batchsize)
for i = 1:batchsize
    wavwrite(audios[i], "B$i.wav",Fs=16000,nbits=32)
end

Function Parameter Introduction

initAddEcho(fs::Number, T₆₀Span::NTuple{2,Number}, roomSpan::NTuple{6,Number}) -> addecho(wav::Array)
  • fs sampling rate
  • T₆₀Span effective reverberation time e.g. (minT60, maxT60)
  • roomSpan room size e.g. (MinL, MaxL, MinW, MaxW, MinH, MaxH)

addEcho

initAddNoise(path::String, period::Int, dBSpan::NTuple{2,Number}) -> addnoise(speech::Array)
  • path a path only full of noise WAVs
  • period every #period it would change another noise wav.
  • dBSpan span of SNR e.g. (mindB, maxdB)

addNoise

initClipWav(clipSpan::NTuple{2,Number}) -> clipwav(wav::Array)
  • clipSpan how much it would clip a wav e.g. (0.5,2.0)

distortion

initDropWav(fs::Real, ratioSpan::NTuple{2,Number}) -> dropwav(wav::Array)
  • fs sampling rate
  • ratioSpan span of droping ratio e.g. (0.02, 0.09). 1.0 is the uplimit.

randomdrop

initFarfieldWav(fs::Real, maxvalueSpan::NTuple{2,Number}) -> farfieldwav(wav::Array)
  • fs sampling rate
  • maxvalueSpan ranges from (0.0,1.0). Smaller means farther away. (0.2, 0.9) is recommended.

farfield

initSpeedWav(speedSpan::NTuple{2,Number}) -> speedwav(wav::Array)
  • speedSpan range of speed perturbation. (0.85, 1.15) is recommended.

fast

slow

All the NTuple{2,Number} parameters should follow the small on the left and the big on the right i.e. (minvalue, maxvalue). To precisely control the extent of augmentation, the below functions could be used:

  • addEcho
  • addNoise
  • clipWav
  • dropWav
  • farfieldWav
  • speedWav

For details, check the documentation or enter the help?> mode.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.