Comments (8)
I'd like to push back a little on the idea that number_RBR
is a common usage. Of the 1053 usages of number
in our training data, 1052 are number_NN
, and the last one is Revolution Number_NNP 9
, which I believe should also be nn
based on the latest tagging guidelines. Also, it should be JJR
, not RBR
, right? In your example, his right side is numb_JJ
is clearly an adjective.
It's hard to even come up with examples that make sense. Nevertheless, on the walk to work I came up with a few. If we add these to the training data, the models might pick up that number
is sometimes an adjective... but it's going to be drowning in over 1000 examples of nouns, so I'm not sure it will make much difference.
As my arthritis gets worse, my thigh gets number_JJR
Cocaine makes my lips number_JJR than meth
The only time I felt number_JJR was when I rubbed one out three times in a row
proposed parses for these:
( (S
(SBAR
(IN As)
(S
(NP (PRP$ my) (NN arthritis))
(VP
(VBZ gets)
(ADJP (JJR worse)))))
(, ,)
(NP (PRP$ my) (NN thigh))
(VP
(VBZ gets)
(ADJP (JJR number)))))
( (S
(NP (NN Cocaine))
(VP
(VBZ makes)
(S
(NP (PRP$ my) (NNS lips))
(ADJP
(ADJP (JJR number))
(PP
(IN than)
(NP (NN meth))))))))
( (S
(NP
(NP (DT The) (JJ only) (NN time))
(SBAR
(S
(NP (PRP I))
(VP
(VBD felt)
(ADJP (JJR number))))))
(VP
(VBD was)
(SBAR
(WHADVP (WRB when))
(S
(NP (PRP I))
(VP
(VBD rubbed)
(NP (NN one))
(PRT (RP out))
(NP
(NP (CD three) (NNS times))
(PP
(IN in)
(NP (DT a) (NN row))))))))))
from corenlp.
also, as a followup, the CoreNLP lemmatizer already properly handles number_JJR
from corenlp.
First, thanks so much for the speedy and thoughtful reply!
Second, I think you are totally right that it should be JJR not RBR (that was my mistake, apologies), and that is indeed quite a difference in statistical distribution within the training data, so I'm not surprised that the tagger struggles. However, for whatever reason, I am seeing different behavior from you. On corenlp.run, for example, it's definitely not labeling things as JJR and is lemmatizing as "number":
![Screen Shot 2023-08-01 at 1 36 58 PM](https://private-user-images.githubusercontent.com/51869/257635612-53bdb161-556a-436d-8608-eb38db1c84aa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDM2MTAzNzUsIm5iZiI6MTcwMzYxMDA3NSwicGF0aCI6Ii81MTg2OS8yNTc2MzU2MTItNTNiZGIxNjEtNTU2YS00MzZkLTg2MDgtZWIzOGRiMWM4NGFhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjI2VDE3MDExNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBkODU2NmZiZGQxNzI1NDZjMDBiOTNjYzkyNzE4ZTg4OTZkMTI5YmU0MTk4MTZiYjJhNmMzOTgzODU4NGJlNjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ECRRdRiEcaJJP2qSgigWC6OqzRD6r91m7gTO15I1wSg)
And running locally I'm getting the same result.
from corenlp.
from corenlp.
Aha! I understand, thanks for the clarification. Those proposed parses look fine to me; I agree that it's not likely to overcome that degree of word-sense imbalance in the data but it certainly can't hurt to include a few bonus examples for the PoS tagger.
from corenlp.
Oh and also, now I'm confused by your comment "also, as a followup, the CoreNLP lemmatizer already properly handles number_JJR" - as best as I can tell, it definitely is not handling that scenario, and is lemmatizing it to "number_NN".
from corenlp.
also, as a followup, the CoreNLP lemmatizer already properly handles number_JJR
If by chance you give the lemmatizer number
with the tag JJR
, it returns the lemma numb
. I used it to convert those trees to a UD representation in the commit I made above, for example.
stanfordnlp/handparsed-treebank@c1a405b
from corenlp.
from corenlp.
Related Issues (20)
- 400 Client Error while running CoreNLPParser HOT 2
- how to add fine-grained ner feature on the android project? HOT 1
- CoreNLPParser not working on Chinese text HOT 1
- Can't use retrained sentiment analysis model on Eclipse HOT 1
- an info shows "INFO CoreNLP - CoreNLP Server is shutting down." HOT 4
- for ner type - DATE, normalizedNER is not coming in appropriate format. HOT 1
- Passing current datetime to sutime through corenlp server? HOT 7
- Question about constituency score HOT 8
- CoreNLP converter produces invalid CoNLL
- corenlp.run seems to be down HOT 3
- German Morphology HOT 10
- HTML Interface not Reflecting `quote` Annotator Specified in Properties File HOT 1
- new requirement in parsers HOT 2
- TokensRegex cannot detect rules cross the period '.' HOT 10
- Online demo is down HOT 7
- Compile error, 'tree' can't be resolved...can't figure out what's going on! HOT 11
- com.apple.eawt.Application can not be resolved to a type (in class OSXAdapter) HOT 5
- Demo Website Issue HOT 2
- An exception occurred: Expecting value: line 1 column 1 (char 0) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corenlp.