Git Product home page Git Product logo

Comments (13)

rtobar avatar rtobar commented on July 21, 2024 1

I already implemented the a new map_type option, it's on the master branch. Could you give it a try?

from ijson.

rtobar avatar rtobar commented on July 21, 2024

ijson.items (I'm assuming you are using this?) doesn't return a dict by default; it returns an iterator that you can use to navigate individual values. You should be able to use that to build your OrderedDict directly with the values coming out of the iteration, but without more details it would be difficult to give more advise.

from ijson.

jpmckinney avatar jpmckinney commented on July 21, 2024

Here is some sample code:

echo "{}{}" > test.json
import ijson.backends.yajl2_cffi as ijson

with open('test.json', 'rb') as f:
    for item in ijson.common.items(ijson.parse(f, multiple_values=True), ''):
        print(type(item))

Output is:

<class 'dict'>
<class 'dict'>

from ijson.

rtobar avatar rtobar commented on July 21, 2024

In your example you are selecting the top-level element, which is an object; thus you get dictionaries. Have you had a look at the examples in https://github.com/ICRAR/ijson/blob/master/README.rst? I think you are basically after the lower-level ijson.parse function, but again without realistic JSON content I can't say for sure.

from ijson.

jpmckinney avatar jpmckinney commented on July 21, 2024

I essentially want the option for this line to be map = OrderedDict() instead of map = {}: https://github.com/isagalaev/ijson/blob/e252a50db34b71cc2b5e0b9a77cd76dee8e95005/ijson/common.py#L116

I can re-implement ijson.common.items to use a new sub-class of ObjectBuilder that contains the change above, but that seems like a lot of effort to get a change in behaviour that is very commonly used in the standard library's json module.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

There are a couple of gotchas with modifying the object builder directly:

  • It is used by most, but not all backends (the C back-end re-implements this logic in C for efficiency), so it's not exactly a one-liner.
  • It would affect all levels in your JSON structure, which is not necessarily what you want. As far as I understand it, you want to preserve the order at the highest level, not necessarily in the deeper objects.

So again, I think such a change would be a bit of an overkill.

On the other hand, I think you basically want a modified version of this: isagalaev#62 (comment)

from collections import OrderedDict

import ijson
from ijson.common import ObjectBuilder

def objects(data):
    key = '-'
    builder = None
    for prefix, event, value in ijson.parse(data):
        if not prefix and event == 'map_key':
            if builder:
                yield key, builder.value
            key = value
            builder = ObjectBuilder()
        elif prefix.startswith(key):
            builder.event(event, value)
    if builder:
        yield key, builder.value

with open('json.json', 'rb') as data:
    result = OrderedDict(objects(data))
for key, value in result.items():
    print(key, value)

from ijson.

jpmckinney avatar jpmckinney commented on July 21, 2024

I do want it to affect all levels :) (like with the object_pairs_hook in the standard library). Converting to an OrderedDict at the top level is easy – just OrderedDict(item) in my earlier snippet.

from ijson.

jpmckinney avatar jpmckinney commented on July 21, 2024

Right now, I need to do something like this (won't work with all backends):

import ijson.backends.yajl2_cffi as ijson

# Copy of ijson.common.items, using different builder.
def items(prefixed_events, prefix):
    prefixed_events = iter(prefixed_events)
    try:
        while True:
            current, event, value = next(prefixed_events)
            if current == prefix:
                if event in ('start_map', 'start_array'):
                    builder = OrderedObjectBuilder()
                    end_event = event.replace('start', 'end')
                    while (current, event) != (prefix, end_event):
                        builder.event(event, value)
                        current, event, value = next(prefixed_events)
                    del builder.containers[:]
                    yield builder.value
                else:
                    yield value
    except StopIteration:
        pass


# Copy of ObjectBuilder, using OrderedDict instead of dict.
class OrderedObjectBuilder(ijson.common.ObjectBuilder):
    def event(self, event, value):
        if event == 'start_map':
            map = OrderedDict()
            self.containers[-1](map)

            def setter(value):
                map[self.key] = value
            self.containers.append(setter)
        else:
            super().event(event, value)

Later, my code calls items.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

@jpmckinney thanks for the pointer to the mechanism used by the standard lib, I actually didn't know about it. Such a generic solution sounds good actually i.e., provide a object_pairs_hook argument or similar that can be used by the object builder. Do you think you could provide a PR with the change, including a test? Otherwise I could implement when I get some time, probably next week. Note that the C backend will need this as well.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

@jpmckinney while we are at this, maybe offering an option for using something other than lists could also be a possibility worth considering.

from ijson.

jpmckinney avatar jpmckinney commented on July 21, 2024

I'm not sure that I can get to it this week – and I'm not familiar with Python in C.

Using something other than lists sounds interesting; however, it hasn't come up as an option in the standard library. I think it's fine to start with object_pairs_hook.

The standard library does allow alternative constructors through parse_float, parse_int, parse_constant, and object_hook (which is lower priority than object_pairs_hook). Another optional parameter is cls, which allows even more customization of decoding. However, I think these can be implemented later if there is demand (for example, I prefer ijson's default behaviour of using Decimal instead of float).

from ijson.

jpmckinney avatar jpmckinney commented on July 21, 2024

It works! Thanks

from ijson.

rtobar avatar rtobar commented on July 21, 2024

Great! I'll close then for now, and will also adjust the title for future reference.

from ijson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.