Git Product home page Git Product logo

dataturks-engg / entity-recognition-in-resumes-spacy Goto Github PK

View Code? Open in Web Editor NEW
440.0 13.0 217.0 366 KB

Automatic Summarization of Resumes with NER -> Evaluate resumes at a glance through Named Entity Recognition

Home Page: https://medium.com/@dataturks/automatic-summarization-of-resumes-with-ner-8b97a5f562b

Python 100.00%
named-entity-recognition spacy-models resume-parser python annotation-tool labeling-tool text-annotation

entity-recognition-in-resumes-spacy's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

entity-recognition-in-resumes-spacy's Issues

Spcay can't train overlaped entity

ValueError: [E103] Trying to set conflicting doc.ents: '(549, 582, 'Designation')' and '(539, 581, 'Designation')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

how to run this project

means i don't know python and also i dont know about ER ans spacy so can you plz guide me ?
means what all thing that i should install to run bcoz i want resume parser functinality

Spacy crushes

Hello,

Very nice work and great results.

Could you please help me with spacy part: it keeps breaking down and it seems to be a very common issue. how did you solve it?

Tahnks a lot.

How to read json file, after run this code it can't read the json file

In the repo you have given json file and when i run the code it can't read the json file.
it just simply loading the juPYTER notebook after the main function call and when i run on pycharm it throw the error

ERROR:root:Unable to process traindata.json
error = 'NoneType' object is not iterable
Traceback (most recent call last):
File "train.py", line 22, in convert_dataturks_to_spacy
for annotation in data['annotation']:
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
File "train.py", line 123, in
train_spacy()
File "train.py", line 53, in train_spacy
for _, annotations in TRAIN_DATA:
TypeError: 'NoneType' object is not iterable

ERROR:root:Unable to process traindata.json
error = 'NoneType' object is not iterable
Traceback (most recent call last):
File "", line 12, in convert_dataturks_to_spacy
for annotation in data['annotation']:
TypeError: 'NoneType' object is not iterable

Accuracy Statistics

Hi,
Thanks for putting together this project. I have a question about the accuracy reporting. It seems that we are only reporting accuracy (and F1 scores etc ) only for the last Resume and not a aggregate scores. Is that a correct understanding?

Input to test the model

Is it possible to test the model with personal PDF resume? Or is there any function to convert a PDF resume in the "DataTurks JSON format"?

valueerror

ValueError: [E103] Trying to set conflicting doc.ents: '(1476, 1501, 'Designation')' and '(1476, 1485, 'Companies worked at')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Info Needed...

Hi @DataTurks @abhishek-narayanan-dataturks ,
I clone this repository and run successfully without any errors and may i know how to give the resume as input from the code i was unable to find the code related can you explain this..

Thanks and Regards,
Manikantha Sekhar..

kernel died restarting

is there any solution for this?

Warning: Unnamed vectors -- this won't allow multiple vectors models to be loaded. (Shape: (0, 0))
Statring iteration 0

Kernel died, restarting

It shows only one label result

Hi,

I've create dataset with 5 label (person, location, date, time, counrty) using dataturks NER tagging and i trained my model using dataturks to scapy and train scapy scripts.

I get results but its returning only Person label result. I re-run script few time than i get date and person label sometimes.

Also i realized that if i use 'testdata.json' (project test data) i get results for all labels even if i trained the model with my own data.

I couldnt find what can be reason for it.

Code: https://paste.ubuntu.com/p/7G28brTWgv/

Train File: https://paste.ubuntu.com/p/GGTFdGbHDN/
Test File: https://paste.ubuntu.com/p/29XWP6MRrp/

Read annotated data with Doccano

Hi,
Please how can i read my annotated data with another tool named Doccano.
1) Here is my annotated data's form:

"annotation": [
[
79,
99,
"Nom complet"
],

2) The code that i want to change to read my annotated data:

    for line in lines:
        data = json.loads(line)
        text = data['content']
        entities = []
        for annotation in data['annotation']:
            #only a single point in text annotation.
            point = annotation['points'][0]
            labels = annotation['label']
            # handle both list of labels or a single label.
            if not isinstance(labels, list):
                labels = [labels]

            for label in labels:
                #dataturks indices are both inclusive [start, end] but spacy is not [start, end)
                entities.append((point['start'], point['end'] + 1 ,label))

Saving/Loading Custom Dataset

Hi, I am trying to do inference with the given code. I am getting decent results when testing the code with testdata.json after using nlp.update(). Issue is when i save the model to output_dir with nlp.to_disk() after training the nlp with nlp.update(). When I load the trained model with nlp2.from_disk(output_dir) or nlp2 = spacy.load(output_dir), and then test the model with nlp2, then I am getting very wrong results. Also noticed that the output_dir has number of files and folders in it instead of a single file (like in the case of keras, if we save a model, it is save as a single '.h5' file.). Am I missing out something here? I am relatively new to SpaCy.

ERROR:root:Unable to process traindata.json

error = 'NoneType' object is not iterable
Traceback (most recent call last):
File "", line 22, in convert_dataturks_to_spacy
for annotation in data['annotation']:
TypeError: 'NoneType' object is not iterable

TypeError Traceback (most recent call last)
in
----> 1 train_spacy()

in train_spacy()
51
52 # add labels
---> 53 for _, annotations in TRAIN_DATA:
54 for ent in annotations.get('entities'):
55 ner.add_label(ent[2])

TypeError: 'NoneType' object is not iterable

getting this error after function calling

Read another form of annotated data

How can i read my annotated data ?
"annotation": [
[
79,
99,
"Nom complet"
],

  1. This is the annotated data of the code:
    "annotation": [
    {
    "label": [
    "Companies worked at"
    ],
    "points": [
    {
    "start": 1749,
    "end": 1754,
    "text": "Oracle"
    }
    ]
    },

ValueError: [E103]

I get the error mentioned below while training, even when I used the same code.

ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

E24 error

I'm getting the following error when I execute train.py

ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the GoldParse was not correct. For example, are all labels added to the model?

Anything I should be doing differently?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.