Git Product home page Git Product logo

big-5-gender-predictor's Introduction

Gender Prediction with Big 5 Personality Traits

In the field of clinical psychology there is a generally accepted model known as the "Big 5 Personality Traits" which as the title suggests breaks down human personality along 5 axes:

  • Openness to experience : inventive/curious vs. consistent/cautious
  • Concientiousness : efficient/organized vs. easy-going/careless
  • Extraversion : outgoing/energetic vs. solitary/reserved
  • Agreeableness : friendly/compassionate vs. challenging/detached
  • Neuroticism : sensistve/nervous vs. secure/confident

Through the use of simple self-reported adjective/charactersitic tests, a percentile score can be determined for each trait. These percentile scores can then be used as feature sets in classification algorithms to predict sex (which is also reported in the test).

The large dataset used for model training and testing is from the Open Source Psychometrics Project [raw_data] which:

provides a collection of interactive personality tests with detailed results that can be taken for personal entertainment or to learn more about personality assessment

At the time of writing this README there are 19719 responses present in data.csv:

data %>% 
    group_by(gender) %>%
    summarise(n_rows = length(gender))

# A tibble: 4 x 2
  gender n_rows
   <int>  <int>
1      0     24 # no response
2      1   7608 # male
3      2  11985 # female
4      3    102 # other

The test responses are initially stored in a 57 column dataframe (author's responses shown below in list) with extraneous (though still interesting to explore) data regarding race, age, dominant hand, ISO country code, etc. The various levels and responses of these columns can be investigated further in codebook.txt.

author <- c(3,21,1,1,1,2,CAN,
            2,2,4,3,4,2,2,2,4,2, # openness
            1,4,2,2,1,1,1,2,2,3, # concientiousness
            2,3,1,2,2,3,2,3,2,3, # extraversion
            3,2,4,1,4,1,4,2,4,4, # agreeableness
            4,1,4,1,3,2,5,4,4,4) # neuroticism

The questions related to each trait are summed and then a percentile calculated for each trait relative to the other responses. There is some unexciting data transformation and grouping into train and test sets, which are then fed to the following algorithms:

Setup + Running

First, download and install R 3.5.x from https://cran.rstudio.com/

Next, download and run the RStudio installer for your platform from the organization's download page

Open RStudio and run through initial configuration steps such as selecting your CRAN mirror (see earlier link for more information on their place in the R ecosystem).

With the IDE set up you can go ahead and fork/download the repository, ensuring the file structure is maintained. Once this is complete go ahead and open up big-5-gender-predictor-notebook.Rmd

The final step is to copy the below codeblock into the console (don't worry, its only installing the necessary packages for the report to run):

install.packages('mda')
install.packages('MASS')
install.packages('klaR')
install.packages('nnet')
install.packages('kernlab')
install.packages('caret')
install.packages('e1071')
install.packages("tidyverse")
install.packages("keras")

There will be some minor conflicts between the packages and required packages of the above, but these can be ignored.

And you're done! Hit Ctrl + Shift + Enter to individually execute the code blocks denoted by the ```{r} ``` wrapper or Ctrl + Alt + P to execute all code blocks above the cursor

Adding Your Test Data

Appending your personal data for the algorithms to predict on is quite simple:

  1. Open your local copies of ./data/custom_data.csv and ./notebook.txt

  2. Observing the headers of the .csv, follow the commentary in the codebook and answer all personality and extraneous questions, seperating answers with a single comma.

Note: It is important that all 58 columns be provided a value else errors will be thrown due to inconsistent row sizing. Obviously the only questions that require a valid answer are the 50 columns prefaced by an O, C, E, A, or N, and the gender column.

  1. Run the report as explained above in Setup + Running, your custom set of predictions will be available at the bottom of the report along with other custom users!

big-5-gender-predictor's People

Contributors

christophersparling avatar

Watchers

James Cloos avatar

big-5-gender-predictor's Issues

Formalize README.md

Goal: Improve introduction to repository

  • include Big 5 image with repo title
  • retitle repo
  • elaborate on Big 5 methodology and history + example breakdown
  • make reference to codebook.txt
  • explain how users can add their own data to be checked
  • include references to algorithms and packages used within
  • explain running and setup for notebook
  • take codebook test myself and offer author's predictions
  • include average values and some exploratory insights into the dataset (though this would cause problems were the dataset to be continually updated)

#


  • [ ]
  • [ ]
  • [ ]

5-point visual generator

Goal: visualize custom user percentile scores in radar chart

  • research packages that could be used
  • display visuals for 1+ custom users
  • add visualization examples to README.md
  • include visualization instructions in report or README.md

Permit custom user data

Goal: Functionality to allow input of non-dataset data

  • create data source for custom users
  • create new custom section for predictions using all algorithms
  • beautify these predictions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.