Comments (4)
Perhaps Name parsing/Training is not a well embraced function of parserator.
I have tested Probable People and was hoping for better accuracy in training with Name formats in Parserator.
from parserator.
Hey @renospec,
Thanks for waiting on this. Can you give me a better sense of what you're trying to do? If you're just looking to alter/improve the behavior of name parsing, you might find it easier to develop off of the probablepeople library and retrain the model yourself. We don't have probablepeople-specific docs for this yet, but the guide for usaddress should be nearly identical.
from parserator.
Thank You
I have developed with the ProbablePeople Library and have trained the model for my data.
After the training session, I started testing different name formats and found that name formats that were not trained for were not successfully parsed. So that means that I have to train all variations of name formats....!
from parserator.
Thanks for your patience on this @renospec! It's been a busy week on my end.
When you were developing probablepeople, did you train it on the canonical training data in addition to your new data? As per the Building & Testing the Code section of the docs, that command should look like this:
parserator train name_data/labeled/labeled.xml,name_data/labeled/company_labeled.xml probablepeople
That's all I can think of off the top of my head that might be causing your error here. If you did that correctly, then the next step will be for me to take a look at your new training data and the name formats you're testing to see if I can reproduce your error. Are you comfortable sharing that data?
from parserator.
Related Issues (20)
- guidelines for reporting parsing issues for usaddress/probablepeople
- fix console label behavior when there are no strings left to label
- Error while training HOT 14
- Can't Install HOT 3
- README: Add more code snippets. Copyedit. Add info about the team.
- parserator and ducking.wit.ai HOT 1
- Exposing lower level model evaulation data HOT 2
- Training on usaddress and probable twice on same dataset HOT 1
- Can we extend this to other countries with right labelled data HOT 2
- Features used and type of encoding to train the CRF model HOT 1
- parserator's us_address succesfully trained .crfsuite file not being used and also not found in the directories HOT 4
- "Parserator init" The specified procedure could not be found HOT 5
- ValueError when training HOT 2
- Give more weight to some labels HOT 1
- Merge pull request from unicode rewrote training.py and borked a number of downstream projects HOT 1
- Issue when running init a new parser HOT 2
- Error on running parserator with no MODULE FOUND ERROR HOT 1
- TypeError: 'write() argument 1 must be unicode, not str' HOT 1
- Fix simple typo: represention -> representation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parserator.