Git Product home page Git Product logo

ascdtest's Introduction

Live Testing for Arabic Speech Command Recognition

Live testing app for this Arabic keyword spotting model.

Quickstart

$ virtualenv .env
$ source .env/bin/activate
$ pip install -r requirements.txt
$ python main.py

Dataset

Original Dataset: https://github.com/abdulkaderghandoura/arabic-speech-commands-dataset

The dataset is a list of pairs (x, y) where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one second in length sampled at 16 kHz. There were 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 × 10 × 40 = 12000). Lastly, the total size of the dataset is ~384 MB. The table below lists the 40 chosen keywords with their translations into Arabic and pronunciations in the International PhoneticAlphabet (IPA):

As commonly done in machine learning settings, we split the dataset into three subsets: training, validation, and testing.

Considering the number of instances in our dataset, we decided to keep 80% of them as the training set, 10% as the validation set, and the remaining 10% are kept as a hold-out testing set.

In our split method, we guarantee that all recordings of a certain contributor are within the same subset. In this way, we avoid having signals with some similarities in both the training and validation/testing sets, as this may affect the validity of the results. Besides, it makes sure that the model will learn to generalize to new people outside of our dataset.

Data Augmentation

We used audiomentations for the augmentation tasks.

We combined and used several data augmentation techniques over 10 rounds with a probability of 0.5 for each augmentation to make up for the low volume of data:

  • Add gaussian noise to the samples
  • Time stretch the signal without changing the pitch
  • Shift the samples forwards or backwards
  • Frequency masking
  • Time masking

We also added 3000 silent segments with some Gaussian noise to the dataset to be able to detect silence.

Data Preprocessing

MFCCs are one of the most widely used features to represent speech signals in ASR systems. Although it is not the only one, it is known to help achieve remarkable results compared to other features, and this prompted us to use it in our experiments.

Model

A convolutional neural network of 3 stacked convolutional layers with 64, 32, and 32 channels (feature maps), respectively. Each layer is followed by batch normalization and a 2 × 2 max-pooling layer. Finally, these layers are succeeded by a dropout layer with 0.3 omission probability and a fully connected feed-forward layer with 64 hidden units.

ascdtest's People

Contributors

abdelhakeem avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.