188.501 Similarity Modeling 1 - WS 2019

Complete student data

Johannes Bogensperger - 01427678
Benjamin Ottersbach - 01576922

Finding Kermit

The two sub projects in this repository "Kermit_Optical_Recog_VGG16"and "SimModelAudio" contain programs and data to recognize Kermit the frog via images and sound samples taken from three episodes of the Muppet Show. This is a binary problem with the two possible states "Kermit is present" and "Kermit is NOT present".

The first project "Kermit_Optical_Recog_VGG16" which is based on a CNN for image recognition, is the final architecture after we tried simple approaches from internet tutorials. For this final architecture, we chose to use the VGG-16 from K. Simonyan and A. Zisserman (Oxford University) proposed in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. This CNN is made for transfer-learning and already trained on various various image like the ImageNet dataset. Therefore we only needed to train the last layers. For the adaption of this model to our problem we put two dense layer on top of it and a dropout layer to prevent overfitting.

With the second project "SimModelAudio" we experienced far greater problems. After following some tutorials for speaker recognition, we came to the conclusion that LSTM classification layers upon the Mel-frequency cepstral coefficients (MFCC) of the audio data are not capable of providing sufficient results. We found that it is crucial to reduce the number of coefficients returned by librosa from 20 down to 13/7 since the last coefficient will contain mostly noise. After CNNs didn't deliver the desired results, we chose to use a simple Random Forest as classifier. We had to restrict our hyperparameter searchspace drastically and stored intermediate MFC coefficients before model training due to long runtimes. Since the results of our approaches were not satisfactory we tried various sampling techniques and ended up with SMOTE upsampling.

We have prepared two datasets for learning to identify KERMIT. Adapting DATA_DIR1 in the code will set the dataset to learn the classifier upon.

DATA_DIR1 = 'data/kermit/' # A dataset containing a refined set of very pure samples of Kermits voice without background noise, music, audience laughter etc.
DATA_DIR1 = 'data/kermit_big/' # The full dataset of all audio files initially labelled as "Kermit present", may be considered ground truth

While the "pure" dataset is prefered qualitywise it only contains 1min50sec worth of samples.

Set Up of the enviroment and Entry point of the code

Our image and audio samples are contained in compressed "data" folders for the projects in the github release "V1.0". Unzip the files:

"data_for_audio_recognition_kermit.zip" in SimModelAudio/data/..
"data_for_visual_rocognition_kermit.zip" in Kermit_Optical_Recog_VGG16/data/.. (don't forget to create the data folder.. empty folders are not added by git and I didn't see the point in adding a .keep file or dummy file..)

and run the corresponding main methods:

"Kermit_Optical_Recog_VGG16" - Kermit_Optical_Recog_VGG16/main.py
"SimModelAudio" - SimModelAudio/main.py

Furthermore the image classification keras.model can be found as well in the project folder (models/final_model) and be loaded via keras.models.load_model.

The python environments can be built via the requirements.txt files in each project folder!

Performance indicators (e.g. Recall, Precision, etc.)

Image detection

Final results for the test set:

F1 Score:0.9806835066864783
Accuracy:0.9887737478411054

Confusion Matrix	Predicted Not present	Predicted Present
Not present	815	8
Present	5	330

ROC Curve

Audio detection

KERMIT - BIG

The final results for the big and "dirty" dataset which is highly unbalanced and uses all kermit samples (not just pure ones)

Accuracy: 0.84762839385018
F1 Score: 0.4400096176965617

Confusion Matrix	Predicted Not present	Predicted Present
Not present	12041	1072
Present	1257	915

ROC Curve

KERMIT - SMALL/PURE

Even though the training sets are always upsampled via SMOTE we couldn't achieve better results on the original highly unbalanced distribution of the test data.

Accuracy: 0.9537098560354375
F1 Score: 0.3953712632594021

Confusion Matrix	Predicted Not present	Predicted Present
Not present	12713	412
Present	215	205

ROC Curve

Timesheets

Our timesheet can be found online: https://docs.google.com/spreadsheets/d/18DE5sUamwnyQ6VUXzKJwsusqlUtsCt6sKx07ir5BiHI/edit?usp=sharing

bottersb / sm_w19 Goto Github PK

sm_w19's Introduction

188.501 Similarity Modeling 1 - WS 2019

Complete student data

Finding Kermit

Set Up of the enviroment and Entry point of the code

Performance indicators (e.g. Recall, Precision, etc.)

Image detection

ROC Curve

Audio detection

KERMIT - BIG

ROC Curve

KERMIT - SMALL/PURE

ROC Curve

Timesheets

sm_w19's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent