Git Product home page Git Product logo

ds_salary_opt's Introduction

Data Scientist Salary Across United States

In this project, I am interested to know the salary across United States. Since it is not possible to obtain the salary of Data Scientist but we can use the salary from H1B Visa application to know how much do employers pay to Data Scientists.

We can obtain the data in the H1B Salary Database provided by Department of Labor. But it does not provide an api or csv files for us to download. If we have the link, we can scrape the html of this page with Beautifulsoup. After we have prepared the data, we can visualize with Plotly.
Note that the data is just approximatation.

Data Acquisition

1 - Data Scientist salary
The data scientist salary based on H1B applications can be found in the Department of Labor website: H1B Database
This site has an API to request the salary in json file. I have used Beautifulsoup in Getdata.py to extract the data on the html page and export to csv for EDA purpose for the first step. Let's call this salary dataset.

2 - Longtitude and Latitude of US cities If we want to plot on the map through Plotly, we will need the longtitude and latitude but it was not available on salary data set. So we need to obtain those data on the other site: OpenDataSoft
This site has an API to request the top 1000 largest cities in the US for latitude and longtitude in json file. Let's call this location dataset

3 - US State in full name and abbreviation
You may see the columns for states in salary dataset and location dataset not consistent, so I copy the list for state in full name and abbreviation from USPS site (Since there are around 50-60 rows, it was faster to copy and paste in Excel, I shouldn't be lazy).

Visualization

Once all the data is ready, I joined salary and location dataset. Then, I found the mean of the salary by location. After that, I can plot it with Plotly.
You may find the visualization below:
Screenshot

Alternatively, you may interact with the map.
For the plot, I used the code on Plotly
All the steps I mentioned in this section are done in Viz.py

ds_salary_opt's People

Contributors

jacquessham avatar

Stargazers

Shantanu Oak avatar

Watchers

James Cloos avatar

Forkers

ahmedyes2000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.