We will be investigating common laboratory test results collected from suspected Covid-19 patient to predict Covid-19 infection in patients by building Supervised Machine Learning Model and Unsupervised Learning Model based on a dataset obtained from Hospital Israelita Albert Einstein, at São Paulo, Brazil. We will be comparing the outcome of each supervised learning model and exploring the impact of Data Engineering and Feature Engineering on the outcomes as well as discussing the limitations of this experiment.
The team first come up with a basic data sets with minimal complications to feature engineering and minimal complexity of each training model. We studied each individual outcome and compare that to our hypothesis and the trying to understand why some expected trend or results were not achieved. The training models and accuracy scores are then recalculated based on a more refined model and / or with more complications introduced to ensure the final outputs are as consistent as possible by having minimal standard deviations for discussion.
- Time Series - growth trends of different coutryies/regions
- Supervised Machine Learning
- Impact of data engineering & feature engineering (FE)
- Logistic Regression result and resampling
- KNN result and resampling
- Decision Tree Model Result
- Neural Network Result
- Conclusion (model that yields the best outcome)
- Unsupervised Machine Learning
- K-Means
- Hierarchical Clustering
- DBSCAN