Gender Prediction with Big 5 Personality Traits

In the field of clinical psychology there is a generally accepted model known as the "Big 5 Personality Traits" which as the title suggests breaks down human personality along 5 axes:

Openness to experience : inventive/curious vs. consistent/cautious
Concientiousness : efficient/organized vs. easy-going/careless
Extraversion : outgoing/energetic vs. solitary/reserved
Agreeableness : friendly/compassionate vs. challenging/detached
Neuroticism : sensistve/nervous vs. secure/confident

Through the use of simple self-reported adjective/charactersitic tests, a percentile score can be determined for each trait. These percentile scores can then be used as feature sets in classification algorithms to predict sex (which is also reported in the test).

The large dataset used for model training and testing is from the Open Source Psychometrics Project [raw_data] which:

provides a collection of interactive personality tests with detailed results that can be taken for personal entertainment or to learn more about personality assessment

At the time of writing this README there are 19719 responses present in data.csv:

data %>% 
    group_by(gender) %>%
    summarise(n_rows = length(gender))

# A tibble: 4 x 2
  gender n_rows
   <int>  <int>
1      0     24 # no response
2      1   7608 # male
3      2  11985 # female
4      3    102 # other

The test responses are initially stored in a 57 column dataframe (author's responses shown below in list) with extraneous (though still interesting to explore) data regarding race, age, dominant hand, ISO country code, etc. The various levels and responses of these columns can be investigated further in codebook.txt.

author <- c(3,21,1,1,1,2,CAN,
            2,2,4,3,4,2,2,2,4,2, # openness
            1,4,2,2,1,1,1,2,2,3, # concientiousness
            2,3,1,2,2,3,2,3,2,3, # extraversion
            3,2,4,1,4,1,4,2,4,4, # agreeableness
            4,1,4,1,3,2,5,4,4,4) # neuroticism

The questions related to each trait are summed and then a percentile calculated for each trait relative to the other responses. There is some unexciting data transformation and grouping into train and test sets, which are then fed to the following algorithms:

Mixture Discriminant Analysis : [mda package]
Quadratic Discriminant Analysis : [MASS package]
Regularized Discriminant Analysis : [klaR package]
Neural Network : [nnet package]
Flexible Discriminant Analysis : [mda package]
Support Vector Machine : [kernlab package]
k-Nearest Neighbours : [caret package]
Naive Bayes : [e1071 package]
Additionally, a custom multi-layer ANN was developed using the keras package

Setup + Running

First, download and install R 3.5.x from https://cran.rstudio.com/

Next, download and run the RStudio installer for your platform from the organization's download page

Open RStudio and run through initial configuration steps such as selecting your CRAN mirror (see earlier link for more information on their place in the R ecosystem).

With the IDE set up you can go ahead and fork/download the repository, ensuring the file structure is maintained. Once this is complete go ahead and open up big-5-gender-predictor-notebook.Rmd

The final step is to copy the below codeblock into the console (don't worry, its only installing the necessary packages for the report to run):

install.packages('mda')
install.packages('MASS')
install.packages('klaR')
install.packages('nnet')
install.packages('kernlab')
install.packages('caret')
install.packages('e1071')
install.packages("tidyverse")
install.packages("keras")

There will be some minor conflicts between the packages and required packages of the above, but these can be ignored.

And you're done! Hit Ctrl + Shift + Enter to individually execute the code blocks denoted by the ```{r} ``` wrapper or Ctrl + Alt + P to execute all code blocks above the cursor

Adding Your Test Data

Appending your personal data for the algorithms to predict on is quite simple:

Open your local copies of ./data/custom_data.csv and ./notebook.txt
Observing the headers of the .csv, follow the commentary in the codebook and answer all personality and extraneous questions, seperating answers with a single comma.

Note: It is important that all 58 columns be provided a value else errors will be thrown due to inconsistent row sizing. Obviously the only questions that require a valid answer are the 50 columns prefaced by an O, C, E, A, or N, and the gender column.

Run the report as explained above in Setup + Running, your custom set of predictions will be available at the bottom of the report along with other custom users!

christophersparling / big-5-gender-predictor Goto Github PK

big-5-gender-predictor's Introduction

Gender Prediction with Big 5 Personality Traits

Setup + Running

Adding Your Test Data

big-5-gender-predictor's People

Contributors

Watchers

big-5-gender-predictor's Issues

Formalize README.md

Cleanup Rmd file for the different algorithms

#

5-point visual generator

Permit custom user data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent