Git Product home page Git Product logo

aehabv / indeed-fake-job-posting-prediction Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 26.52 MB

A machine learning model is built using PySpark's MLlib library to automatically flag suspicious job postings on Indeed.com. The dataset includes 18,000 job descriptions, out of which about 800 are fake.

License: Other

Jupyter Notebook 100.00%
fake-jobposts-prediction indeed job-postings natural-language-processing nlp pyspark pyspark-mllib

indeed-fake-job-posting-prediction's Introduction

NLP in PySpark's MLlib - Fake Job Posting Predictions

Language Badge PySpark Badge Library Badge Library Badge Library Badge License Badge

Indeed.com has hired us to create a system that automatically flags suspicious job postings on its website. Due to the high volume of job postings, their employees do not have the capacity to check every posting, so they would like to prioritize which postings to review before deleting them. Our task is to use the attached dataset with NLP to create an algorithm that automatically flags suspicious posts for review.

Dataset

This dataset contains 18K job descriptions out of which about 800 are fake. The data consist of both textual information and meta-information about the jobs.

Data Source: https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction

The dataset has the following columns with their data types:

Column Name Description
job_id Unique identifier for each job posting
title Job title
location Location of the job
department Department of the company
salary_range Salary range of the job
company_profile Description of the company
description Description of the job
requirements Requirements for the job
benefits Benefits offered by the company
telecommuting Whether the job allows telecommuting or not
has_company_logo Whether the company has a logo or not
has_questions Whether the job has questions for applicants or not
employment_type Type of employment (full-time, part-time, etc.)
required_experience Required experience for the job
required_education Required education for the job
industry Industry of the company
function Function of the job
fraudulent Whether the job posting is fraudulent or not

Prerequisites

Before running the code, you will need to have the following installed:

  • PySpark: the Python API for Apache Spark
  • Jupyter Notebook: an interactive development environment for Python

Usage

To run the code, open the Fake_Job_Posting_Predictions.ipynb file in Jupyter Notebook and execute the cells in order. The notebook contains detailed explanations of each step in the code and the results obtained.

indeed-fake-job-posting-prediction's People

Contributors

aehabv avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.