Git Product home page Git Product logo

covid-19-cough-classification-phase-1-'s Introduction

COVID-19-Cough-Classification-phase-1

Check out my 2 YOUTUBE channels for more:

  1. Mrzaizai2k - AI (NEW)
  2. Mrzaizai2k (old)

This project was created to distinguish between covid and non-covid patients based on cough sound. This is a non-invasive, low-cost, and simple procedure, we may use it as a first filter before testing with PCR or other methods. In phase 1, I attempted to understand the challenges with this method, the dataset, how to deal with audio files, and for the first time, integrate SMOTE and K-fold cross validation. 

I'm sorry, but I don't have a '.py' file for you this time. But, as always, you may test on my Kaggle notebook by clicking HERE.

To make amends, I'll go over what I did on this project in further detail:

Table of contents

1. Dataset

The dataset I used here is from AICovidVN115m contest. Which includes 669 positive cases (16.45% of total) and 3399 negative cases. But I didn't extract the features from .wav file because it's not really necessary now. I tool the raw feature which had been extracted from this repo

2. Primary features

The feature extracted in this project were MFCCs, Mel frequency spectrogram, chroma and those are mean value np.mean.

3. Scaling data

"Just to give you an example — if you have multiple independent variables like age, salary, and height; With their range as (18–100 Years), (25,000–75,000 Euros), and (1–2 Meters) respectively, feature scaling would help them all to be in the same range, for example- centered around 0 or in the range (0,1) depending on the scaling technique."

Reference: https://www.atoti.io/when-to-perform-a-feature-scaling/

4. Imbalanced data

As you can see, our project is imbalanced positive: 669 (16.45% of total), negative cases: 3399 There are a alot of proposed methods to solve this like:

  • Over sampling
  • Undersampling
  • Hybrid over and under sampling
  • Gain more data
  • Data augmentation (I will use it for my next Covid classification phase 2)
    • Time stretch
    • Pitch shift
    • GAIN
    • Background noise and so on...

Reference: https://phamdinhkhanh.github.io/2020/02/17/ImbalancedData.html#45-thu-th%E1%BA%ADp-th%C3%AAm-quan-s%C3%A1t

In this project. I'll try resolving the imbalanced data by oversampling with SMOTE. It's a oversampling method. For me the result is not really good because they change the features and we don't know how they change its and if the new features were true in real life. I guess in phase 2 I will try on Gain and background noise to oversample the dataset

However, SMOTE help a lot on the training time with early stopping

Reference: https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/

5. Model

I use a simple ANN model with drop out layers (0.5) to avoid overfitting. The model here is quite simple. I prefer using CRNN + attention and a more complicated model for 2D dataset instead of 1D dataset like this. You know, it's just PHASE 1!

6. K fold cross validation

Here I use K-fold (Stratified k fold) with the oversampled data

After all, I think K-fold is just a method to generally assessed how good or bad the model is. Help us tune the hyperparameters better

Reference:

https://viblo.asia/p/lam-chu-stacking-ensemble-learning-Az45b0A6ZxY

https://www.machinecurve.com/index.php/2020/02/18/how-to-use-k-fold-cross-validation-with-keras/

https://github.com/SadmanSakib93/Stratified-k-fold-cross-validation-Image-classification-keras/blob/master/stratified_K_fold_CV.ipynb

https://miai.vn/2021/01/18/k-fold-cross-validation-tuyet-chieu-train-khi-it-du-lieu/

7. Result

SORRY FOR THE PICTURE RESOLUTION You can visit my Kaggle Notebook to see it clear

Original data

precision recall f1-score support
Negative 0.94 0.94 0.94 680
Positive 0.70 0.68 0.69 134
accuracy 0.90 814

Figure 1. Result with original data

Figure 2. AUC = 91% with original data

Oversampling data with SMOTE

precision recall f1-score support
Negative 0.94 0.97 0.96 680
Positive 0.83 0.70 0.76 134
accuracy 0.93 814

Figure 3. Result with SMOTE

Figure 4. AUC = 93% with SMOTE

Oversampling data and k-fold cross validation

precision recall f1-score support
Negative 0.97 0.96 0.96 680
Positive 0.81 0.84 0.82 134
accuracy 0.94 814

Figure 5. Result with SMOTE and K_fold cross validation

Figure 6. AUC = 95% with SMOTE and K_fold cross validation

covid-19-cough-classification-phase-1-'s People

Contributors

mrzaizai2k avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

zhanghengjing2

covid-19-cough-classification-phase-1-'s Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.