Git Product home page Git Product logo

skill-embeddings's Introduction

Semantic reasoning of skills in the domain of human resources

This is a repository containing code for the Master Thesis conducted in he University of Koblenz-Landau together with talentsconnect AG.

The research aim is to compare 3 methods of embedding skills into the vector space - distributional (text-based), relational (ontology-based) and hybrid approach using Attract-Repel model. The paper is available upon request. With this repository you can train the Atrac-Repel and evaluate it on the 2 datasets - intrinsic, that tests the skill embeddings compared to the manual annotaions, and extrinsic ha evaluated the performance of the similar jobs task that takes embeddings as an input. The extrinsic data is provided by talentsconnect AG and can be shared upon request.

Attract-Repel training

First install the required libraries either with pip:

pip install -r requirements.txt

or with conda:

conda install -r requirements.txt

The Attract-Repel is trained on the word2vec vectors and uses linguistic consraints derived from the ESCO ontology, which are contained in the atttract-repel/word-vectors/init_google_we.txt and attract-repel/linguistic_constraints/similar_skills.txt respectively. The file attract-repel/config/experiment_parameters.cfg contains the hyperparameters used in the grid search to find the best combination of attract_margin, batch_size and l2. To only run Attract-Repel on a specified set of hyperparameters, write the same value in the first and second place.

Run the following command to start the training:

python attract-repel/code/attract_repel.py -c config_path -s save_model -e evaluation

Arguments:

  • config_path : a path to the config file, the default value is attract-repel/config/experiment_parameters.cfg
  • save_model : boolean variable, whether to store the model file in the results/grid_search folder, the default value is False
  • evaluation : boolean variable, whether to run the evaluation of the models within the training, the default is True. To run the evaluation you need to provide 3 paths in the config file: gold_standard (path to the extrinsic set), companyDataset (path to the list of jobs used in the extrinsic evaluation) and skills_annotated_sample (path to the intrinsic evaluation set).

skill-embeddings's People

Contributors

stannida avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.