Comments (3)
@dzamo what you are asking is basically not possible with the code you came up with.
When the ijson generators raise a StopIteration
exception it means they exhausted their input, which in your case means the parse_events
generator was exhausted, which in turn means the input data was fully parsed already.
I think your confusion comes from a wrong (but common) expectation as well: you probably think arr
will appear only once in your document, and therefore expected ijson.items
to finish doing its work once the array finished. In JSON documents keys are allowed to be repeated, and therefore ijson
can't stop processing the document just before the key containing the array you are iterating over has finished. For example:
$> echo '{"arr": [1, 2, 3], "arr": [1, 2, 3]}' | python -m ijson.dump -m items -p arr.item
#: value
--------
0: 1
1: 2
2: 3
3: 1
4: 2
5: 3
There are more reasons, but those are enough I think to explain the situation.
To implement what you need you'll have to do all your parsing with the results from items.parse
. To construct the objects within your arr
array you can take inspiration on how items
is inspired (see here) and adapt it to process a single array.
from ijson.
@rtobar: thank you for the pointer, your suggestion works nicely. I've now got a generator called items_once
which is the same as ijson.items
but it only parses the next encountered occurrence of a prefix. Perhaps this would be useful addition to the ijson API? Either way, I'll leave my code here in case other users have a similar need.
def _items_once(event_stream, prefix):
'''
Generator dispatching native Python objects constructed from the ijson
events under the next occurrence of the given prefix. It is very
similar to ijson.items except that it will not consume the entire JSON
stream looking for occurrences of prefix, but rather stop after
completing the next encountered occurrence of prefix.
'''
current = None
while current != prefix:
current, event, value = next(event_stream)
while current == prefix:
if event in ('start_map', 'start_array'):
object_depth = 1
builder = ObjectBuilder() # imported from ijson.common
while object_depth:
builder.event(event, value)
current, event, value = next(event_stream)
if event in ('start_map', 'start_array'):
object_depth += 1
elif event in ('end_map', 'end_array'):
object_depth -= 1
del builder.containers[:]
yield builder.value
else:
yield value
current, event, value = next(event_stream)
from ijson.
Thanks @dzamo for putting the code up, I'm sure somebody else will find it useful too.
I'm reluctant to add something like this to ijson. Not because it's a bad idea on itself, but because it's a bit of a niche use case, and adding it (and maintaining it in the longer term) is a bit more than copy-pasting your code. In particular, the yajl2_c
extension re-implements everything in C, so there's always that duplication to take care of.
If more and more people are finding this to be useful we could re-evaluate; otherwise having the code available here is good enough I think. Thanks again!
from ijson.
Related Issues (20)
- yajl2_c backend crashes on PyPy3 HOT 19
- Is there a way to recursively iterate the key? HOT 4
- ijson.items(file, prefix) waits for EOF HOT 8
- Wheels for Python 3.12 with yajl2_c backend HOT 4
- Include array index HOT 2
- Iterate over more than one prefix? HOT 2
- How to parse a large gzip json file. HOT 2
- Make new release HOT 2
- yajl2_c backend for lambda function HOT 2
- How to use ijson to covert string to dict? HOT 3
- How to read json records in chunks using ijson? HOT 4
- Question: is it possible that returing bytes instead of str could speedup parsing? HOT 3
- Thread safety HOT 9
- Full support for byte stream generator HOT 9
- Allow to use ijson package by a relative import HOT 4
- How can I most-efficiently check for a key in the top-level of a json object? HOT 3
- Python3.12 compilation error: ‘PyGenObject’ has no member named ‘gi_code’ HOT 5
- Is it possible to use isjon with Jsonl, ndjson ? HOT 5
- Memory leak on exception handling with yajl2_c backend HOT 6
- _yajl2 backend broken with Python 3.12 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ijson.