Git Product home page Git Product logo

sm_w19's Introduction

188.501 Similarity Modeling 1 - WS 2019

Complete student data

  • Johannes Bogensperger - 01427678
  • Benjamin Ottersbach - 01576922

Finding Kermit

The two sub projects in this repository "Kermit_Optical_Recog_VGG16"and "SimModelAudio" contain programs and data to recognize Kermit the frog via images and sound samples taken from three episodes of the Muppet Show. This is a binary problem with the two possible states "Kermit is present" and "Kermit is NOT present".

The first project "Kermit_Optical_Recog_VGG16" which is based on a CNN for image recognition, is the final architecture after we tried simple approaches from internet tutorials. For this final architecture, we chose to use the VGG-16 from K. Simonyan and A. Zisserman (Oxford University) proposed in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. This CNN is made for transfer-learning and already trained on various various image like the ImageNet dataset. Therefore we only needed to train the last layers. For the adaption of this model to our problem we put two dense layer on top of it and a dropout layer to prevent overfitting.

With the second project "SimModelAudio" we experienced far greater problems. After following some tutorials for speaker recognition, we came to the conclusion that LSTM classification layers upon the Mel-frequency cepstral coefficients (MFCC) of the audio data are not capable of providing sufficient results. We found that it is crucial to reduce the number of coefficients returned by librosa from 20 down to 13/7 since the last coefficient will contain mostly noise. After CNNs didn't deliver the desired results, we chose to use a simple Random Forest as classifier. We had to restrict our hyperparameter searchspace drastically and stored intermediate MFC coefficients before model training due to long runtimes. Since the results of our approaches were not satisfactory we tried various sampling techniques and ended up with SMOTE upsampling.

We have prepared two datasets for learning to identify KERMIT. Adapting DATA_DIR1 in the code will set the dataset to learn the classifier upon.

DATA_DIR1 = 'data/kermit/' # A dataset containing a refined set of very pure samples of Kermits voice without background noise, music, audience laughter etc.
DATA_DIR1 = 'data/kermit_big/' # The full dataset of all audio files initially labelled as "Kermit present", may be considered ground truth

While the "pure" dataset is prefered qualitywise it only contains 1min50sec worth of samples.

Set Up of the enviroment and Entry point of the code

Our image and audio samples are contained in compressed "data" folders for the projects in the github release "V1.0". Unzip the files:

  • "data_for_audio_recognition_kermit.zip" in SimModelAudio/data/..
  • "data_for_visual_rocognition_kermit.zip" in Kermit_Optical_Recog_VGG16/data/.. (don't forget to create the data folder.. empty folders are not added by git and I didn't see the point in adding a .keep file or dummy file..)

and run the corresponding main methods:

  • "Kermit_Optical_Recog_VGG16" - Kermit_Optical_Recog_VGG16/main.py
  • "SimModelAudio" - SimModelAudio/main.py

Furthermore the image classification keras.model can be found as well in the project folder (models/final_model) and be loaded via keras.models.load_model.

The python environments can be built via the requirements.txt files in each project folder!

Performance indicators (e.g. Recall, Precision, etc.)

Image detection

Final results for the test set:

  • F1 Score:0.9806835066864783
  • Accuracy:0.9887737478411054
Confusion Matrix Predicted Not present Predicted Present
Not present 815 8
Present 5 330

ROC Curve

alt text

Audio detection

KERMIT - BIG

The final results for the big and "dirty" dataset which is highly unbalanced and uses all kermit samples (not just pure ones)

  • Accuracy: 0.84762839385018
  • F1 Score: 0.4400096176965617
Confusion Matrix Predicted Not present Predicted Present
Not present 12041 1072
Present 1257 915

ROC Curve

alt text

KERMIT - SMALL/PURE

Even though the training sets are always upsampled via SMOTE we couldn't achieve better results on the original highly unbalanced distribution of the test data.

  • Accuracy: 0.9537098560354375
  • F1 Score: 0.3953712632594021
Confusion Matrix Predicted Not present Predicted Present
Not present 12713 412
Present 215 205

ROC Curve

alt text

Timesheets

Our timesheet can be found online: https://docs.google.com/spreadsheets/d/18DE5sUamwnyQ6VUXzKJwsusqlUtsCt6sKx07ir5BiHI/edit?usp=sharing

sm_w19's People

Contributors

bottersb avatar jbogensperger avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.