Git Product home page Git Product logo

jobdescription2jobtitle's Introduction

================ jobdescription2jobtitle readme

Introduction

This program given a piece of text such as a cv, job summary or a Linkdein profile converts it to a 300d vector (using average of word vectors) and ranks ONET job titles based on similarity to that description. The ONET is a standard dataset consisting of about 1100 job titles and their description. It includes other information about jobs that we didn't use here.

For each job title and description, a 300d average word vector is built. Given a piece of text the program finds the most similar job titles related to that text.

The similarity/distance distribution of a piece of text to a 1100d job titles can be used for comparison to another piece of text to see if both pieces of text are corresponding to one person or not using cosine distance between them.

If two pieces of text correspond to the same person their distance to 1100 job titles should be similar (their cosine distance should be low).

The cosine distance between two pieces of text can be used as a single feature when trying to decide if two pieces of text correspond to a single person or not.

To run the program gensim should be installed and the pre-trained Google word2vec file should be downloaded and the path in the source changed accordingly.

Pre-trained word vectors

download it from https://docs.google.com/uc?id=0B7XkCwpI5KDYNlNUTTlSS21pQmM&export=download and move it into the resources directory.

Job Title and Description

can be downloaded from ONET dataset here https://www.onetcenter.org/dl_files/database/db_21_0_text/Occupation%20Data.txt.

Contact

Afshin Rahimi [email protected]

jobdescription2jobtitle's People

Contributors

afshinrahimi avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.