Comments (16)
What backend are you using? See what happens when you try to import the other backends explicitly (see the README file for instructions). I suspect you are using the python backend, which is most likely the cause of the trouble.
Also, what platform are you on, and how did you get ijson? There are binary wheels in PyPI for most Linux/Mac combinations in which the yajl2_c
backend should work correctly.
from ijson.
Separately, do you need to iterate over the whole file a first time? If you are extracting/filtering only some information out of it you could break sooner and not read all of it -- unless of course the filtering itself depends on the length of the locations array
from ijson.
@rtobar I think I might have jumped a bit early to how to use this. I did not look at any backend stuff cause I thought it was for special cases.
I import ijson to my current code and run it, so no handling, but guess I need to look at it.
Right now Im running it on a Mac, and im aiming to run it in a docker container, where the VM has a memory restriction of 2GB.
I will come back with some questions after I read up on your suggestion
from ijson.
@rtobar I cant really get backends to load in any different way.
import ijson.backends.yajl2_cffi as ijson
but with yajl2_c
But was not able to load this, it failed.
Is ther anything else one should do to load the backend properly?
from ijson.
@vongohren sorry, but I couldn't understand what exactly work and what didn't. Did both yajl2_cffi
and yajl2_c
fail? Also, please let me know how you installed ijson -- whether you installed it manually from the repository (works, but if you don't have the yajl2
library available in your system it won't build the yajl2_c
backend) or using pip
(preferred method, the yajl2_c
backend should work out of the box).
A different test you can try out is running he https://github.com/ICRAR/ijson/blob/master/benchmark.py tool. Download that file, and run benchmark.py -l
to get a list of the backends you have available. You can also try benchmark.py -i your_file.json -M kvitems
to see how long it takes to parse via kvitems
with the different backends (and you can use the -B
flag to select a particular backend, if available). See all help with benchmark.py --help
.
from ijson.
@rtobar thanks for the patience
That benchmarking tool did give some insights!
Backends:
- python
Benchmarks:
- long_list
- big_int_object
- big_decimal_object
- big_null_object
- big_bool_object
- big_str_object
- big_longstr_object
- object_with_10_keys
- empty_lists
- empty_objects
I guess I have very few backends available.
But I did install via pip, so as you say, the backend should work out of the box?
To test this I just cloned this repo and ran it on my bare mac. Meaning that pip was not used for this benchmark test.
Is there a way to run the benchmark inside the venv where I did pip install the ijson?
It did also finish the test, which is not optimal. You think it can be faster?
#mbytes,method,test_case,backend,time,mb_per_sec
321.567, kvitems, locations.json, python, 147.779, 2.176
from ijson.
Ok, so a bit silly, but I just ran brew install yajl and the cloned benchmark gave yajl2 as a possible backend. But is it suppos eto show yajl2_c as a possible backend aswell, if I can use it? Because I see yajl2, is slower than the two other versions?
#mbytes,method,test_case,backend,time,mb_per_sec
321.567, kvitems, location.json, python, 168.581, 1.907
321.567, kvitems, location.json, yajl2, 104.954, 3.064
But it is still quite time consuming. Should it be this high? Or are there any other tweaking stuff I can do?
from ijson.
@vongohren thanks for all the details, now things are becoming clear. Indeed you were using the python backend, which was my initial suspicion. If you pip install cffi
that will also give you access to the yajl2_cffi
too. But in the other hand you still don't have the yajl2_c
backend.
What version of python (and MacOS) are you running? If it's 3.8 that might explain it, as I think (from memory) I had to skip generating binary wheels for that version. This is not the case for Linux wheels, which are generated for all python versions correctly.
Now that you have yajl installed, you could try to compile the package yourself, hoping that you will end up with a usable yajl2_c backend for your tests (again, when building your container this shouldn't be a problem, as the package installed with pip should have it). The yajl2_c backend is usually ~10x faster than yajl2 and yajl2_cffi, so you should be down to reasonable times.
from ijson.
Im running:
MacOS: 10.15.3
Python: 3.6.7
Im getting my code to run when i add import ijson.backends.yajl2_c as ijson
I pressume that it is using the right lib for the best speed then?
Or can it turn back to some default mode?
Im running this simple code
locations = ijson.kvitems(json_file, 'locations.item')
timestampMsObjects = (v for k, v in locations if k == 'timestampMs')
print(timestampMsObjects)
timestampMs = list(timestampMsObjects)
print(len(timestampMs))
It takes Parsing the file in 241.8568 seconds
, which is not that great. What might the reason be?
This is the benchmarking iv got on the same file. Should i see yajl2_c in that list?
#mbytes,method,test_case,backend,time,mb_per_sec
321.567, kvitems, location.json, python, 168.581, 1.907
321.567, kvitems, location.json, yajl2, 104.954, 3.064
Iv also found this library: https://pypi.org/project/jsonslicer/#description, that was able to get through the file and I could handle all entries in about 98.12s. Without any special configuration.
I would love to understand if im not able to run yajl2_c, that is why its not showing its true speed, or if this is a limet?
Or maybe my code approach is bad?
Im basically trying to map the jsonfile with just a couple of the map_keys, included.
from ijson.
I also tried this code on this many entries: 1062126
It never finished, had to quit it
parser = ijson.parse(json_file)
f_out.write("{\"locations\":[")
for prefix, event, value in parser:
if(event == "end_map"):
f_out.write("}")
f_out.write("]}")
So maybe I'm taking som wrong approach to your lib?
from ijson.
Iv might have found the culprit. Suddenly my function was blazingly fast.
I removed a memory profiler
Jsonslice did this in 6.6 seconds
ijson got the running down to 10.8 seconds.
But still, I got it faster with jsonslicer
from ijson.
This is the benchmarking iv got on the same file. Should i see yajl2_c in that list?
Yes. I'm still puzzled: you said that in your code you do import ijson.backends.yajl2_c as ijson
, but you don't see that backend on the benchmark list, meaning that the benchmark can't import it. Maybe try running benchmark.py
from a different directory, not directly from the top-level directory of the ijson repo (e.g., put it under /tmp
and execute it there), python might be getting confused and loading ijson
from the repo instead of the version you have installed.
So maybe I'm taking som wrong approach to your lib?
It seems you can do better than what you are doing. You mentioned a couple of times you just want to take some map keys out of the JSON stream, and it that case kvitems
is a killersituations because you don't really need ijson to create objects for you. You can probably do what you need with ijson.parse
, which will be faster than kvitems
as it doesn't create any objects.
Or try also using `ijson.items(f, 'locations.item.timestampMs'). That should return only those values and nothing else, rather than building loads of objects that you end up discarding anyway.
from ijson.
If you’re just looking for the best performance, try pip install ijson==3.0rc2, which is faster than JsonSlicer on their own benchmark.
from ijson.
Thanks @jpmckinney that is interseting, I will look at it!
from ijson.
@rtobar cool, thanks will check it up. It takes some time because this is a hobby project. But I appriciate the feedback. The jsonslicer have not provided feedback yet, so I would more preferrebly use this repo which do answer :)
from ijson.
@rtobar @jpmckinney thanks for the followup, I will close this as Im moving onwards with a satisiefied result. But the feedback and assitance is much appreciated
from ijson.
Related Issues (20)
- Nested structure reading HOT 2
- Use stacklevel to point str vs bytes warning to user code HOT 3
- Release wheels for 3.11 HOT 6
- How to determine which backend is being used at runtime? HOT 3
- Is the yajl_c backend supported on PyPy? HOT 7
- High level interface to iterate over lists HOT 3
- HighLevelAPI: Raise an error if the prefix does not exist HOT 2
- Is it possible to use multiple prefix HOT 8
- yajl2_c backend for lambda function HOT 2
- How to use ijson to covert string to dict? HOT 3
- How to read json records in chunks using ijson? HOT 4
- Question: is it possible that returing bytes instead of str could speedup parsing? HOT 3
- Thread safety HOT 9
- Full support for byte stream generator HOT 9
- Allow to use ijson package by a relative import HOT 4
- How can I most-efficiently check for a key in the top-level of a json object? HOT 3
- Python3.12 compilation error: ‘PyGenObject’ has no member named ‘gi_code’ HOT 5
- Is it possible to use isjon with Jsonl, ndjson ? HOT 5
- Memory leak on exception handling with yajl2_c backend HOT 6
- _yajl2 backend broken with Python 3.12 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ijson.