Git Product home page Git Product logo

iota's People

Contributors

smee avatar thebusby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iota's Issues

IndexOutOfBoundsException with chunk-seq

Hi,

Simply trying to create a chunk-seq on a ~10Gb file I get an IndexOutOfBoundsException.
I can read the file using other functions.

Versions used are:

  • Java 1.8.0_91
  • Clojure 1.8.0
  • iota 1.1.3

See the stacktrace below:

user=> (iota/chunk-seq "/path/to/a_big_file.csv" 1024)

IndexOutOfBoundsException   java.nio.Buffer.checkBounds (Buffer.java:567)
user=> *e
#error {
 :cause nil
 :via
 [{:type java.lang.IndexOutOfBoundsException
   :message nil
   :at [java.nio.Buffer checkBounds "Buffer.java" 567]}]
 :trace
 [[java.nio.Buffer checkBounds "Buffer.java" 567]
  [java.nio.ByteBuffer get "ByteBuffer.java" 686]
  [java.nio.DirectByteBuffer get "DirectByteBuffer.java" 285]
  [iota.Mmap get "Mmap.java" 79]
  [iota.FileRecordSeq nextChunkEnd "FileRecordSeq.java" 116]
  [iota.FileRecordSeq split "FileRecordSeq.java" 139]
  [iota.FileChunkSeq first "FileChunkSeq.java" 35]
  [clojure.lang.RT nthFrom "RT.java" 942]
  [clojure.lang.RT nth "RT.java" 897]
  [clojure.core$print_sequential invokeStatic "core_print.clj" 58]
  [clojure.core$fn__6072 invokeStatic "core_print.clj" 153]
  [clojure.core$fn__6072 invoke "core_print.clj" 153]
  [clojure.lang.MultiFn invoke "MultiFn.java" 233]
  [clojure.tools.nrepl.middleware.pr_values$pr_values$fn$reify__907 send "pr_values.clj" 35]
  [clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__941$fn__954 invoke "interruptible_eval.clj" 113]
  [clojure.main$repl$read_eval_print__7408 invoke "main.clj" 241]
  [clojure.main$repl$fn__7417 invoke "main.clj" 258]
  [clojure.main$repl invokeStatic "main.clj" 258]
  [clojure.main$repl doInvoke "main.clj" 174]
  [clojure.lang.RestFn invoke "RestFn.java" 1523]
  [clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__941 invoke "interruptible_eval.clj" 87]
  [clojure.lang.AFn applyToHelper "AFn.java" 152]
  [clojure.lang.AFn applyTo "AFn.java" 144]
  [clojure.core$apply invokeStatic "core.clj" 646]
  [clojure.core$with_bindings_STAR_ invokeStatic "core.clj" 1881]
  [clojure.core$with_bindings_STAR_ doInvoke "core.clj" 1881]
  [clojure.lang.RestFn invoke "RestFn.java" 425]
  [clojure.tools.nrepl.middleware.interruptible_eval$evaluate invokeStatic "interruptible_eval.clj" 85]
  [clojure.tools.nrepl.middleware.interruptible_eval$evaluate invoke "interruptible_eval.clj" 55]
  [clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__986$fn__989 invoke "interruptible_eval.clj" 222]
  [clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__981 invoke "interruptible_eval.clj" 190]
  [clojure.lang.AFn run "AFn.java" 22]
  [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1142]
  [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 617]
  [java.lang.Thread run "Thread.java" 745]]}

Null Pointer Exception

(def mem-file (iota/vec "path/to/moby-dic.txt"))

(defn words [coll](mapcat
%28fn [line]
%28re-seq #"[a-z]+" line%29%29
coll))

(frequencies (words mem-file))

I get a

NullPointerException java.util.regex.Matcher.getTextLength (Matcher.java:1234)

But if I call

(words mem-file)

I get a beautiful list of words.

I guess it has something to do with the machinery of lazy sequences in clojure that get calculated in batches.

So when here´s no lazy seq to return, this doesn´t happen

Otherwise that java.util.regex.Matcher.getTextLength get called and this NNMAP based representation of the file returns a null pointer.

But I didn´t look at the code, these are wild guesses on my side

Can you help me ?

Thanks

Can't use arbitrary separators with iota/vec

The following does what I expect:

(let [f "iotavectest-succeed"]
                   (spit f "0\n1")
                   (let [s (iota/seq f)
                         v (iota/vec f)]
                     (doseq [xs [s v]]
                       (println (type xs))
                       (doseq [x xs]
                         (printf "|%s|\n" x)))))
; iota.FileSeq
; |0|
; |1|
; iota.FileVector
; |0|
; |1|

The following (using an arbitrary separator) does not do what I expect:

(let [f "iotavectest-fail"]
                   (spit f "0.1")
                   (let [s (iota/seq f 0x40000 \.)
                         v (iota/vec f 10 \.)]
                     (doseq [xs [s v]]
                       (println (type xs))
                       (doseq [x xs]
                         (printf "|%s|\n" x)))))
; iota.FileSeq
; |0|
; |1|
; iota.FileVector
; IndexOutOfBoundsException getLine() failure: Chunk #0's [1]  is not within  0...1iota.FileVector.getLine (FileVector.java:152)
; |0.1|

My assumption is that I should be able to use an arbitrary byte other than newline to delimit records/lines in my file. Am I mistaken?

Thanks!

Default Buffer Size, iota Choking

I'm working with a synthetic dataset that I generated. It's about 2GB, with about 496 chars per field, 40 tab-delimited fields per line. Newline separated.

Much to my surprise, iota (using 1.1.2, although it looks like FileSeq hasn't changed outside of the chunked fileseq support) just chokes. The issues and context are detailed at:
(https://gist.github.com/joinr/5cbdaf6c66924129f398). I assume that the issues persist in 1.3, but I have not tested as of yet (for my use case, I believe the seq-functionality and buffer size as are unchanged).

At least one concrete issue is the default buffer size being too large (for my machine at least, I may be an outlier). I know the standard libs default to a much smaller buffer. On my machine, the iota default just kills performance. I understand that the caller can set the buffer, but that ends up being non-obvious. There are perhaps deeper performance issues described in the gist (when comparing performance with alternate implementations).

Thanks for iota.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.