Comments (9)
Hi,
Interesting. When I run on that file, there is an exception from a bug (which I
have fixed), but it is not that exception. That stack trace looks an awful lot
like the caching inside the java builtin Long class is doing funny things --
might it have something to do with your ExecJavaMojo calling things through
reflection?
In any case, I have fixed the big and am running some tests before I release a
fix. 1.1.1 should be out by tomorrow.
Original comment by [email protected]
on 9 Aug 2012 at 5:31
- Changed state: Started
from berkeleylm.
Hi,
Thanks for looking into the issue so quickly.
Interesting that you don't see the same exception. I assume that since
berkeleylm in written in Java it should support input encoded in UTF-8. Is
that a fair assumption?
I have tried calling the program through maven (I imported all the source)
and also without using maven at all and see the same exception in both
cases which is a bit odd if it is caused by reflection.
Original comment by [email protected]
on 9 Aug 2012 at 5:43
from berkeleylm.
UTF-8 should be fine. Hopefully the fix I've committed will resolve your issue
in any case.
Original comment by [email protected]
on 9 Aug 2012 at 7:33
from berkeleylm.
Apologies, I fell asleep on this fix. Version 1.1.1 has been uploaded. Let me
know if this doesn't fix your issue.
Original comment by [email protected]
on 13 Aug 2012 at 2:02
- Changed state: Fixed
from berkeleylm.
I unzipped the new 1.1.1 code but unfortunately am still seeing the same
ArrayIndexOutOfBoundsException. I have tried on a different input data set in
case that was the problem (en-test.txt, attached below) but I see the same
problem on that input.
Here's the steps I took to produce the error:
1. Unzip the code
2. cd to the top level directory, berkeleylm-1.1.1
3. Run ant from the top level directory
4. From the top level directory, run:
java -cp jar/berkeleylm.jar edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText 5
test-en.model en-test.txt
5. Output is:
Reading text files [en-test.txt] and writing to file test-en.model {
Reading in ngrams from raw text {
On line 0
} [2s]
Writing Kneser-Ney probabilities {
Counting counts for order 0 {
} [0s]
Counting counts for order 1 {
} [0s]
Counting counts for order 2 {
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 256
at java.lang.Long.valueOf(Long.java:548)
at edu.berkeley.nlp.lm.map.ExplicitWordHashMap$KeyIterator.next(ExplicitWordHashMap.java:140)
at edu.berkeley.nlp.lm.map.ExplicitWordHashMap$KeyIterator.next(ExplicitWordHashMap.java:121)
at edu.berkeley.nlp.lm.collections.Iterators$Transform.next(Iterators.java:107)
at edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback.parse(KneserNeyLmReaderCallback.java:284)
at edu.berkeley.nlp.lm.io.LmReaders.createKneserNeyLmFromTextFiles(LmReaders.java:299)
at edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText.main(MakeKneserNeyArpaFromText.java:57)
Original comment by [email protected]
on 15 Aug 2012 at 11:34
Attachments:
from berkeleylm.
Followed your steps and did not encounter any exceptions. I'm guessing this is
a bug in your JVM -- the exception is occurring while boxing a long! You can
try using a different JVM, or even try using -server (which you should do
anyway, for speed).
Original comment by [email protected]
on 15 Aug 2012 at 5:10
from berkeleylm.
Thanks again for testing this out. It is quite odd that the error comes from
boxing a long. I ran both with and without -server but saw the exception in
both cases. I'm going to try a different JVM. Would you mind posting the output
you get from running "java -version" so that I can start with that
implementation? I'm using HotSpot 64 bit:
$ java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
Thanks for the help.
Original comment by [email protected]
on 15 Aug 2012 at 5:28
from berkeleylm.
$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-10M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
Original comment by [email protected]
on 15 Aug 2012 at 5:56
from berkeleylm.
I updated my java-6-sun jvm to 1.6.0_34, I was using a version from 2008. I no
longer see the exception. Looks like Oracle has been hard at work fixing
autoboxing issues in the last few years. :)
Original comment by [email protected]
on 15 Aug 2012 at 8:58
from berkeleylm.
Related Issues (20)
- ArrayIndexOutOfBoundsException while calling getLogProb HOT 8
- Runtime exception: Hash map is full with 100 keys. Should never happen. HOT 16
- What's the log base of NgramLanguageModel.getLogProb() ? HOT 5
- Can I feed this library raw counts instead of text files, and have it compute the Kneser Ney probabilities for me? HOT 1
- ArrayOutOfBoundsException when reading in a large ARPA file. HOT 7
- Are the log probabilities comparable across language models? HOT 4
- Cannot train unigram model with Kneser-Ney HOT 2
- Unrealistic perplexity HOT 3
- broken link on http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/ HOT 1
- StarPos and EndPos for ngram log probability HOT 3
- Getting NAN on last trigram when using google binary HOT 1
- Trying to build a language model on higher-order n-grams. HOT 3
- Frequency Map HOT 1
- Calculating log probability over larger document
- Unknown Values
- Old mvn file
- no start token with LmReaders.readNgramMapFromBinary?
- no license information
- Documentation, usage, etc.
- creating and reading arpa files is 1. locale dependant 2. seems to have problems with multiple tabs in the text 3. seems to have some problem with the lack of newlines HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from berkeleylm.