Git Product home page Git Product logo

Comments (16)

rtobar avatar rtobar commented on June 23, 2024

What backend are you using? See what happens when you try to import the other backends explicitly (see the README file for instructions). I suspect you are using the python backend, which is most likely the cause of the trouble.

Also, what platform are you on, and how did you get ijson? There are binary wheels in PyPI for most Linux/Mac combinations in which the yajl2_c backend should work correctly.

from ijson.

rtobar avatar rtobar commented on June 23, 2024

Separately, do you need to iterate over the whole file a first time? If you are extracting/filtering only some information out of it you could break sooner and not read all of it -- unless of course the filtering itself depends on the length of the locations array

from ijson.

vongohren avatar vongohren commented on June 23, 2024

@rtobar I think I might have jumped a bit early to how to use this. I did not look at any backend stuff cause I thought it was for special cases.
I import ijson to my current code and run it, so no handling, but guess I need to look at it.

Right now Im running it on a Mac, and im aiming to run it in a docker container, where the VM has a memory restriction of 2GB.

I will come back with some questions after I read up on your suggestion

from ijson.

vongohren avatar vongohren commented on June 23, 2024

@rtobar I cant really get backends to load in any different way.

import ijson.backends.yajl2_cffi as ijson

but with yajl2_c

But was not able to load this, it failed.

Is ther anything else one should do to load the backend properly?

from ijson.

rtobar avatar rtobar commented on June 23, 2024

@vongohren sorry, but I couldn't understand what exactly work and what didn't. Did both yajl2_cffi and yajl2_c fail? Also, please let me know how you installed ijson -- whether you installed it manually from the repository (works, but if you don't have the yajl2 library available in your system it won't build the yajl2_c backend) or using pip (preferred method, the yajl2_c backend should work out of the box).

A different test you can try out is running he https://github.com/ICRAR/ijson/blob/master/benchmark.py tool. Download that file, and run benchmark.py -l to get a list of the backends you have available. You can also try benchmark.py -i your_file.json -M kvitems to see how long it takes to parse via kvitems with the different backends (and you can use the -B flag to select a particular backend, if available). See all help with benchmark.py --help.

from ijson.

vongohren avatar vongohren commented on June 23, 2024

@rtobar thanks for the patience 😁 And sorry for the sparse communication. Im very new to the python environment so I still need to learn how all the different things can stick together 🤓So not sure how I add yajl2 to my MAC, or the docker container this eventually is going to run in

That benchmarking tool did give some insights!

Backends:
 - python
Benchmarks:
 - long_list
 - big_int_object
 - big_decimal_object
 - big_null_object
 - big_bool_object
 - big_str_object
 - big_longstr_object
 - object_with_10_keys
 - empty_lists
 - empty_objects

I guess I have very few backends available.
But I did install via pip, so as you say, the backend should work out of the box?
To test this I just cloned this repo and ran it on my bare mac. Meaning that pip was not used for this benchmark test.

Is there a way to run the benchmark inside the venv where I did pip install the ijson?

It did also finish the test, which is not optimal. You think it can be faster?

#mbytes,method,test_case,backend,time,mb_per_sec
321.567, kvitems, locations.json, python, 147.779, 2.176

from ijson.

vongohren avatar vongohren commented on June 23, 2024

Ok, so a bit silly, but I just ran brew install yajl and the cloned benchmark gave yajl2 as a possible backend. But is it suppos eto show yajl2_c as a possible backend aswell, if I can use it? Because I see yajl2, is slower than the two other versions?

#mbytes,method,test_case,backend,time,mb_per_sec
321.567, kvitems, location.json, python, 168.581, 1.907
321.567, kvitems, location.json, yajl2, 104.954, 3.064

But it is still quite time consuming. Should it be this high? Or are there any other tweaking stuff I can do?

from ijson.

rtobar avatar rtobar commented on June 23, 2024

@vongohren thanks for all the details, now things are becoming clear. Indeed you were using the python backend, which was my initial suspicion. If you pip install cffi that will also give you access to the yajl2_cffi too. But in the other hand you still don't have the yajl2_c backend.

What version of python (and MacOS) are you running? If it's 3.8 that might explain it, as I think (from memory) I had to skip generating binary wheels for that version. This is not the case for Linux wheels, which are generated for all python versions correctly.

Now that you have yajl installed, you could try to compile the package yourself, hoping that you will end up with a usable yajl2_c backend for your tests (again, when building your container this shouldn't be a problem, as the package installed with pip should have it). The yajl2_c backend is usually ~10x faster than yajl2 and yajl2_cffi, so you should be down to reasonable times.

from ijson.

vongohren avatar vongohren commented on June 23, 2024

Im running:
MacOS: 10.15.3
Python: 3.6.7

Im getting my code to run when i add import ijson.backends.yajl2_c as ijson
I pressume that it is using the right lib for the best speed then?
Or can it turn back to some default mode?

Im running this simple code

    locations = ijson.kvitems(json_file, 'locations.item')
    timestampMsObjects = (v for k, v in locations if k == 'timestampMs')
    print(timestampMsObjects)
    timestampMs = list(timestampMsObjects)
    print(len(timestampMs))

It takes Parsing the file in 241.8568 seconds, which is not that great. What might the reason be?
This is the benchmarking iv got on the same file. Should i see yajl2_c in that list?

#mbytes,method,test_case,backend,time,mb_per_sec
321.567, kvitems, location.json, python, 168.581, 1.907
321.567, kvitems, location.json, yajl2, 104.954, 3.064

Iv also found this library: https://pypi.org/project/jsonslicer/#description, that was able to get through the file and I could handle all entries in about 98.12s. Without any special configuration.

I would love to understand if im not able to run yajl2_c, that is why its not showing its true speed, or if this is a limet?
Or maybe my code approach is bad?

Im basically trying to map the jsonfile with just a couple of the map_keys, included.

from ijson.

vongohren avatar vongohren commented on June 23, 2024

I also tried this code on this many entries: 1062126
It never finished, had to quit it

        parser = ijson.parse(json_file)
        f_out.write("{\"locations\":[")
        for prefix, event, value in parser:
            if(event == "end_map"):
                f_out.write("}")

        f_out.write("]}")

So maybe I'm taking som wrong approach to your lib?

from ijson.

vongohren avatar vongohren commented on June 23, 2024

Iv might have found the culprit. Suddenly my function was blazingly fast.
I removed a memory profiler 😰
Jsonslice did this in 6.6 seconds

ijson got the running down to 10.8 seconds.

But still, I got it faster with jsonslicer

from ijson.

rtobar avatar rtobar commented on June 23, 2024

This is the benchmarking iv got on the same file. Should i see yajl2_c in that list?

Yes. I'm still puzzled: you said that in your code you do import ijson.backends.yajl2_c as ijson, but you don't see that backend on the benchmark list, meaning that the benchmark can't import it. Maybe try running benchmark.py from a different directory, not directly from the top-level directory of the ijson repo (e.g., put it under /tmp and execute it there), python might be getting confused and loading ijson from the repo instead of the version you have installed.

So maybe I'm taking som wrong approach to your lib?

It seems you can do better than what you are doing. You mentioned a couple of times you just want to take some map keys out of the JSON stream, and it that case kvitems is a killersituations because you don't really need ijson to create objects for you. You can probably do what you need with ijson.parse, which will be faster than kvitems as it doesn't create any objects.

Or try also using `ijson.items(f, 'locations.item.timestampMs'). That should return only those values and nothing else, rather than building loads of objects that you end up discarding anyway.

from ijson.

jpmckinney avatar jpmckinney commented on June 23, 2024

If you’re just looking for the best performance, try pip install ijson==3.0rc2, which is faster than JsonSlicer on their own benchmark.

from ijson.

vongohren avatar vongohren commented on June 23, 2024

Thanks @jpmckinney that is interseting, I will look at it!

from ijson.

vongohren avatar vongohren commented on June 23, 2024

@rtobar cool, thanks will check it up. It takes some time because this is a hobby project. But I appriciate the feedback. The jsonslicer have not provided feedback yet, so I would more preferrebly use this repo which do answer :)

from ijson.

vongohren avatar vongohren commented on June 23, 2024

@rtobar @jpmckinney thanks for the followup, I will close this as Im moving onwards with a satisiefied result. But the feedback and assitance is much appreciated

from ijson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.