amy-leaf / adult-data-set-classification-and-classifier-comparison Goto Github PK
View Code? Open in Web Editor NEWThis project forked from tejaswaje/adult-data-set-classification-and-classifier-comparison
In this program we apply machine learning principals to predict weather income exceeds $50k per year on the Adult data set. We use the four techniques to achieve better performance which includes choosing appropriate classifier, preprocessing techniques, parallel infrastructure and external libraries. This program will focus on proper use of each classifier by fine tuning the hyperparameter to achieve the best results, the classifiers include SVM, KNN, Random Forest, Gaussian Naïve Bayes. The preprocessing techniques used to eliminate noise and inconsistency of data are standard scaler, label Encoder and quantile transformer. The best accuracy was achieved by random forest, this Classifier outperforms every other classifier as it makes multiple decision tree which prevents overfitting.
License: MIT License