Male or Female?
Choose the best Name Gender Classifier
We started with a template file where we split the data so we would have comparable results.
https://github.com/groupGit/IS620GroupProject3/blob/master/Project3_version_a.ipynb
We then created a chatbot @ Slack.com to exchange our ideas. We then experimented with different techniques and features. Our goal was to find the best machine learning classifier for the names corpus dataset. Using slack, we split our work and used github to check-in files as and when we got our classifiers done.
The dataset was pretty small with 7944 rows and was split into three data-frames, maintaining the same ration of male and female in all the sets:Validation set with 500 rows, Test set with 500 rows and Train set with 6944 rows. This was the basis for all the classifiers and our team built the classifiers mentioned below.
-
Max Entropy: https://github.com/groupGit/IS620GroupProject3/blob/master/Project3_MaxEntropy.ipynb
-
Random Forest: https://github.com/groupGit/IS620GroupProject3/blob/master/RandomForest.ipynb
-
Decision Tree: https://github.com/groupGit/IS620GroupProject3/blob/master/DecisionTreeT2.ipynb
-
Naive Bayes: a) https://github.com/groupGit/IS620GroupProject3/blob/master/naive_bayes_2a.ipynb b) https://github.com/groupGit/IS620GroupProject3/blob/master/naive_bayes_2.ipynb c) https://github.com/groupGit/IS620GroupProject3/blob/master/naive_bayes_2c.ipynb
The results are summarized under each classifier. The overall view was that the feature with first letter and the last letter performed the best.