side projects in data science
(1) Using Iris data : find relations between measured quantities and develop model. Generate a random set of data and test models
(2) Breast Cancer from Wisconsin data: Develop supervised model to predict if cells from biopsy are malignant or benign.
(3) Use SQLite through python to query a movie rating database that has three different tables. Go though various execises to query different parameters.
(4) Breat Cancer II : Develop supervised model to predict if cells from biopsy are malignant or benign, different dataset/ parameters than first analysis. Implememt random forest model, balancing training data using SMOTE. Calculate cross validation, confusion matrix, and importance of variables (and variability).
(5) Pricing test: princing optimization, evaluate data from pricing test. Data cleaning, plot trends, conlcusions and recommendations.
(6) Bike share analysis. Analytics, finding trends.
(7) NYC Taxi: Airline market share calculated from NYC taxi drop-off traffic at airports. Unsupervised learning. Implementation of various clustering algorithms.
(8) CS Challenge: unit test