Comments (8)
First of all, that is completely awesome as a use case of CLIFF!!
Secondly, you are right that the parser is case sensitive. This is coming
from the underlying Stanford Core NLP parser that uses the case of the text
as an indicator for the entities that it is extracting. You would want to
download a caseless model for the Stanford NER which you can find here:
http://nlp.stanford.edu/software/CRF-NER.shtml
And then integrate that into the CLAVIN technology that underlies CLIFF --
https://github.com/Berico-Technologies/CLAVIN
You also might try posting on CLAVIN's github account to see if anyone has
integrated a caseless version of the parser and maybe you could just use
their code. There would be a wide variety of applications, like parsing
twitter and text messages for example.
Let me know if we can help you further - would love to see the final result
Catherine
On Mon, Jul 14, 2014 at 5:14 PM, Kevin Dyke [email protected]
wrote:
Hi there!
We're planning to utilize CLIFF as part of a broader project on the
history of hip hop in the Twin Cities. The idea is to feed lyrics into the
parser and see what sort of geographical rhyming is happening. Not quite
the use case you envisioned, I imagine, but that's the beauty of FOSS,
right?Anyways, based on the lyrics we've collected/seen, many sources do not
capitalize place names. From my testing it seems that CLIFF's text parser
is case sensitive, and I'm wondering if there's a fairly painless way to
make it case insensitive?If you could at least point me to the direction in the code, I can take a
crack at it.Thanks!
—
Reply to this email directly or view it on GitHub
#27.
from cliff-annotator.
Thanks for the tips! We'll keep you apprised of how things progress. For now I'll close this issue. Thanks again!
from cliff-annotator.
Indeed, if you are using the CLAVIN-NERD distribution in CLIFF, you can load a caseless model for Stanford NER as Catherine mentioned. The "regular" version of CLAVIN, however, uses Apache OpenNLP for named entity recognition, and I'm not aware of any caseless models for OpenNLP.
from cliff-annotator.
Hey Charlie --
I've been meaning to contact you to let you know that Rahul and I wrote a
paper about CLIFF-CLAVIN that was just accepted to a workshop at KDD about
news knowledge discovery - http://ailab.ijs.si/~blazf/NewsKDD2014/
I'm attaching the paper here for your reference (Can I attach things in
github? going to give it a shot). I tried emailing to your
bericotechnologies account but it bounced.
www.kanarinka.com || [email protected] || 617-501-2441
On Wed, Jul 16, 2014 at 7:06 AM, Charlie Greenbacker <
[email protected]> wrote:
Indeed, if you are using the CLAVIN-NERD
https://github.com/Berico-Technologies/CLAVIN-NERD distribution in
CLIFF, you can load a caseless model for Stanford NER as Catherine
mentioned. The "regular" version of CLAVIN, however, uses Apache OpenNLP
for named entity recognition, and I'm not aware of any caseless models for
OpenNLP.—
Reply to this email directly or view it on GitHub
#27 (comment).
from cliff-annotator.
Thanks Charlie, I'll swap out CLAVIN for CLAVIN-NERD. That explains some things. I had implemented the caseless Stanford NER on the CLIFF side of things without messing with CLAVIN, and my results were, to say the least, interesting. Thanks again!
from cliff-annotator.
Catherine, I just responded to you via email at your ikatun.org address. Please let me know if you don't receive it!
from cliff-annotator.
Short story - CLIFF is using Stanford-NER and it's not hard to drop in a different model.
Details:
CLIFF uses Stanford-NER, not Apache OpenNLP. However, we could easily be using a case-sensitive NER model. ParseManager.java#L232 is where it loads the model, but of course it just does that from the config file. The README explains how that works and which model we're using. To add a new model you just have to add a case to this switch statement and edit the config file.
from cliff-annotator.
Interesting. That was what I did in the first place (see it here
https://github.com/SemanticArchives/CLIFF/blob/d75ed0eb7e8e8cc5ad6a16761458a8ea09219113/src/main/java/org/mediameter/cliff/extractor/StanfordNamedEntityExtractor.java#L58
on
our fork).
It seemed that I was getting odd results, but I think I'll do more
extensive testing (I only used a couple test strings).
On Wed, Jul 16, 2014 at 1:26 PM, rahulbot [email protected] wrote:
Short story - CLIFF is using Stanford-NER and it's not hard to drop in a
different model.Details:
CLIFF uses Stanford-NER, not Apache OpenNLP. However, we could easily be
using a case-sensitive NER model. ParseManager.java#L232
https://github.com/c4fcm/CLIFF/blob/master/src/main/java/org/mediameter/cliff/ParseManager.java#L232
is where it loads the model, but of course it just does that from the
config file. The README
https://github.com/c4fcm/CLIFF/blob/3135633059a78f9eb4bd0f06549f63a06458e143/README.md#nermodeltouse
explains how that works and which model we're using. To add a new model you
just have to add a case to this switch statement
https://github.com/c4fcm/CLIFF/blob/c52140218a25cc3bee992690d1d6fd5cba836776/src/main/java/org/mediameter/cliff/extractor/StanfordNamedEntityExtractor.java#L58
and edit the config file.—
Reply to this email directly or view it on GitHub
#27 (comment).
from cliff-annotator.
Related Issues (20)
- return better status while index is building HOT 1
- charIndex of mentions is offset by preceding demonyms when replaceAllDemonyms=true HOT 2
- `results.mentions.source.string` contains the normalised country name, intead of the raw string matched. HOT 2
- integrate co-ref resolution into person entity counts HOT 1
- DELETE IT
- upgrade default Stanford NER to v3.9.2 HOT 9
- News Articles about Washington, always resolve to Washington state instead of Washington, DC. HOT 1
- support language at the query level HOT 1
- move configuration to env-vars HOT 1
- rename packages to match brand HOT 1
- Are there any API docs which go over what the meaning of each key in the json response means? HOT 5
- Can't get CLIFF to run HOT 2
- Unable to access through tomcat (404 error) HOT 1
- Unable to geoparse German text HOT 1
- Gazetteer index Error HOT 3
- Live demo currently down HOT 2
- upgrade to latest NER HOT 1
- upgrade to latest CLIFF
- Add support for French HOT 1
- What is the maximum sequence length? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cliff-annotator.