Git Product home page Git Product logo

accentclassifier's Introduction

Native Language Classification

Overview

Using audio samples from The Speech Accent Archive, I wanted to show that a deep neural network can classify the native language of a speaker.

Table of Contents

  1. Dependencies
  2. Motivation
  3. Data
  4. Model
  5. Running Model
  6. Performance

Dependencies

Data

I started with the data from The Speech Accent Archive, a collection of 2447 audio samples from people for over 300 countries speaking the same English paragraph. The paragraph contains most of the consonants, vowels, and clusters of standard American English. It wasn’t useful to use the 9 audio samples from Finland.

For the purpose of this project, I focused on countries with the most abundant audio samples, and the languages that have distinctly different origins. I chose to work with English, Arabic, and Mandarin. After some filtering and to maintain a balanced dataset, I could only use 73 audio samples from each of the three languages.

Model

Converted wav audio files into Mel Frequency Cepstral Coefficients graph.

MFCC

The MFCC was fed into a 2 Dimensional Convolutional Neural Network (CNN) to predict the native language class.

CNN

* Graph is for illustration purposes only.

Challenges & Solutions

  • Computationally expensive
    • Created an Amazon Web Services Elastic Compute Cloud (EC2) instance that allowed for splitting workload over 32 cores.
  • Small dataset
    • MFCCs were sliced into smaller segments. These smaller segments were fed into the neural network where predictions were made. Using an ensembling method, a majority vote was taken to predict the native language class.

ensembling

Running Model

  ├── README.md  
  ├── src   
  │     ├── accuracy.py
  |     ├── fromwebsite.py
  |     ├── getaudio.py
  │     ├── getsplit.py
  |     ├── trainmodel.py
  ├── models  
  │     ├── cnn_model138.h5
  ├── logs  
  │     ├── events.out.tfevents.1506987325.ip-172-31-47-225
  └── audio
To download language metadata from The Speech Accent Archive and download audio files:
  1. Run fromwebsite.py to get language metadata and save data to bio_metadata.csv

Example:

python fromwebsite.py bio_metadata.csv mandarin english arabic

  1. Run getaudio.py to download audio files to the audio directory. All audio files listed in bio_metadata.csv will be downloaded.

Example:

python GetAudio.py bio_metadata.csv

To filter audio samples to feed into the CNN:
  1. Edit the filter_df method in getsplit.py
    • This will filter audio files from bio_metadata.csv when calling trainmodel.py
To make predictions on audio files:
  1. Run trainmodel.py to train the CNN

Example:

python trainmodel.py bio_metadata.csv model50

  • Running trainmodel.py will save the trained model as model50.h5 in the model directory and output the results to the console.
  • This script will also save a TensorBoard log file into the logs directory.

Performance

Depending on how many languages you use and parameter tweaking, the number of training MFCC segments can vary. During the training of my model, I had roughly 6500 training MFCC segments and validated my results on 44 unsegmented audio files.

Performance

With the three language classification, the model was able to predict the correct native language around 85% accuracy when given an English sample, 57% accuracy when given an Arabic sample, and an 87% when given a Mandarin sample.

Act/Pred English Arabic Mandarin
English 12 1 1
Arabic 1 8 5
Mandarin 1 1 14

accentclassifier's People

Contributors

srbecerra avatar kerschel avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.