This repo contains the code,csv files for each experimental sets as well as a link to the data collection tool we created. The code include the files we used to preprocess our dataset,the CNN model we created from the accents as well as the code to train the neural network.The goal of this project was to create a accent classifer that is able to classify different Caribbean accents correctly. We used a 2D-CNN as our neural network and we had to collect data from various Caribbean countries ourselves since there were not many available resources we could use on recordings of Caribbean Accents. The Caribbean countries/islands we used were Trinidad, Tobago, St. Lucia and Barbados. We used these countries as we were able to collect the most recordings them.
The code for our data collection system is available at https://github.com/Kerschel/Dataset-Speech-Collector This code is hosted on a Digital Ocean cloud server and can be accessed at https://myaccent.app
The code in the prediction branch is also used as an API endpoint to perform accent predictions inside the myaccent.app application which currently is not hosted but can be used locally by running the flask server file.
The files for dataset can be accessed at Experimental Datasets.
Each csv file is used to link the accent (HomeCountry) to a corresponding filename. We also added the duration of the file to be able to filter out the files that were not useful to us i.e persons who submitted very short recordings by error. This filtering was done in trainmodel.py on line 154-161.
Before running the code, all dependencies must be installed first from requirements.txt. Also ensure that you have python installed to perform the following commands:
pip install -r requirements.txt
To run the project use : python trainmodel.py "experimental set csvfile" "modelname" "csvfilename for accuracy results"
- For Example:
python trainmodel.py ES1.csv test_model output.csv
Depending on which experimental set you wish to use the directory name needs to be changed in helpers.py on line 42 to reflect the folder (ES1,ES2,ES3).
The csv file to check the validation and accuracy loss will be in the folder after running the code.
trainmodel.py - this is the main file that bring all the functions together to perform the classification.
helpers.py - contains functions used for the MFCC processing as well as direction of which audio files are to be used in librosa.
accuracy.py - used for the prediction classification to detemine how accuracy the predictions were and to create a confusion matrix to display the results
getsplit.py - used to add or remove how many accents will be used in the classifier also to increase or decrease the training and test size.
duration.py - used to find the length of the audio files and save in a csvfile
Used for ES2 alone
ratereduce.py was used make the audio file slower in order to extract the silence-noise-silence periods from the wave in with extractwords.py
originalrate.py - return the audio files to their original rate from what was lowerered previously
Used for ES2 and ES3
padding.py - used for ES2 and ES3 to add silence to the end of the audio files to make them a uniform length
After training our CNN we were able to acheive best average accuracies as follows on each experimental set:
ES1 - 38.9% on 4 accents
ES2 - 41.25% on 4 accents
ES3 - 68% on 4 accents, 83.3% on 3 accents and 84.1% on 2 accents