Git Product home page Git Product logo

extra_datasets's Introduction

Income prediction

Author: Ayebare Gyavira

Companies have a tendency to ask for how much an applicant is comfortable with as salary. This can be decided based on a number of factors plus other considerations. In predicting this, it allows applicants to estimate how much a company's worth and if they can afford to work there given the company's expectations and environment. In this project, we look at some of the features considered and how they affect whether someone earns less than 50k within their respective companies.

Data Source:

Adult income dataset https://www.kaggle.com/datasets/wenruliu/adult-income-dataset

In the original dataset, there werer 48,842 rows and 15 columns.

Data Dictionary

Feature Name Description
Age The individual's age
Workclass The working class of the individual
Fnlwgt The number of people the census believes the entry represents.
Education The level of education the individual attained
Education-num The education number of the individual
Marital status The current maritual status that the individual's in
Occupation Name of where the individual works
Relationship What responsibility the individual holds in the family
Sex The individual's gender
Race The individual's race if they are at liberty to give it out
capital-gain The gains in capital the individual makes while working
capital-loss The loss in capital the individual makes while working
Hours-per-week The number of hours the individual places into working
Native-country The country of origin of the individual
Income Whether the individual earns above 50k or below. This is the target variable to be predicted

To prepare this data, the data was cleaned, and the following processes were performed:

Exploratory Data Analysis

  • To visualize the data for explantory purposes, bargraphs were used.
  • The bargraph was chosen to show how the working class categories varry with hours per week worked based on whether an income of above 50k or not is earned.
  • The bargraph shows that in all the working classes, those who work for more hours per week earn above 50k compared to those who don't with the exception of those who've never worked it seems that they all receive below 50k. That might be the default amount or something.

workclass_hours

This bargraph shows that the older someone is irrespective of their education level, they earn above 50k. Its most likely an advantage of being more experienced or holding more responsibilities.

education_age

Modeling

Metrics for fine tunned Logistics regression model with Principel Component Analysis applied

Training data

Screenshot 2023-09-30 at 22 02 16

Test data

Screenshot 2023-09-30 at 22 02 54

From the stakeholder's perspective, false positive would mean that individuals might end up spending more than they need to like in projects that end up running bankrupt while a false negative would mean that individuals might end up restraining themselves on a tight budget yet they could afford to be more comfortable with their expenditure.

As seen from the support columns of each metric, the dataset much less number of people with incomes above 50k, so the model had to be tunned to reemedy that.

Recommendations:

  • I'd advise the stake holders to keep a keen eye on their incomes and avoid assumptions, it would be alarming to make more losses than initially anticipated.
  • To secure their place in the greater than 50k income category, i'd advise the stakeholders to keep on upgrading their features for such things are also considered.

extra_datasets's People

Contributors

gyaviwalls avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.