Comments (3)
Yes, there have traditionally been various bugs because of the separation of word() and value() - even though there were some good reasons for it.
But have you tried to reproduce this on v3.5 or the current master? I just added code to the test of StanfordCoreNLPITest:test(), based on your code above:
// check that dependency graph Labels have word()
SemanticGraph deps = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
for (IndexedWord vertex : deps.vertexSet()) {
Assert.assertNotNull(vertex.word());
Assert.assertEquals(vertex.word(), vertex.value());
}
That code passes for me. So maybe this is already fixed? If not can you provide a test case that fails?
(I realize that you may have good compatibility reasons for staying off Java 8 at the moment, but I suspect right now we don't have the energy to release a v3.4.2 unless more or more serious problems are found....)
from corenlp.
I have reproduced the problem with the trunk version of DKPro Core's StanfordCoreferenceResolver which uses CoreNLP 3.5.0.
We use transform the UIMA CAS representation of the required annotations to the CoreNLP representation prior to invoking the MentionExtractor. The procedure is approximately as follows:
- convert tokens
- convert parse tree using LabeledScoredTreeFactory and TreeUtils.createStanfordTree
- convert sentences
- re-generate dependencies from the parse tree using a GrammaticalStructureFactory obtained from the TreebankLanguagePack and finalizing the conversion using ParserAnnotatorUtils.fillInParseAnnotations.
In this process, value() and word() are not set to the same value. I added the code suggested by Anne as a workaround.
So I guess you are saying that word() and value() should be set to the same value by DKPro Core when the tokens are converted from UIMA to CoreNLP? Currently, we only set
- originalText
- word
- beginPosition
- endPosition
- lemma (optional)
- tag (optional)
from corenlp.
I tried an alternative fix in DKPro Core by calling setValue(token-text) when converting the UIMA token to the CoreNLP token (actually CoreLabel) and removing the fix that Anne suggested - that alternative fix, however, doesn't work. Not sure where/why the "value" gets lost in the process mentioned above.
from corenlp.
Related Issues (20)
- new requirement in parsers HOT 2
- TokensRegex cannot detect rules cross the period '.' HOT 10
- Online demo is down HOT 7
- Compile error, 'tree' can't be resolved...can't figure out what's going on! HOT 11
- com.apple.eawt.Application can not be resolved to a type (in class OSXAdapter) HOT 5
- Demo Website Issue HOT 2
- An exception occurred: Expecting value: line 1 column 1 (char 0) HOT 1
- IntervalTree#remove null pointer exception HOT 4
- i am getting a lock screen bug HOT 3
- Upgrade Apache Lucene to resolve vulnerability for consumers HOT 8
- negation modifier HOT 4
- Add Automatic-Module-Name to MANIFEST.MF HOT 22
- english.all.3class.distsim.crf.ser.gz ???? HOT 1
- Training Shift Reduce Parser HOT 1
- German Lemma not working? HOT 10
- Wrong POS for "keine": PRON instead of DET HOT 7
- Support HOT 2
- Is downloads.cs.stanford.edu down? HOT 3
- Arabic Processing data HOT 2
- VBN vs VBD in the input files from PTB
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corenlp.