Git Product home page Git Product logo

Comments (7)

rtobar avatar rtobar commented on August 23, 2024

@jpmckinney all good questions! So:

  • I know it compiles with PyPy (we even publish binary wheels provided by cibuildwheel). About running it: I'm not a PyPy user myself, so I've only ever tried it sporadically to check that the tests run, and they have. I don't do it very often though, so there could be issues that I don't know of (but the CI tests that run when building the wheels always pass with PyPy).
  • I don't know about performance, I've never properly measured it (again, no by PyPy user). Since the Python C API impl for PyPy is slower than for CPython I'd assume yajl2_c runs slower in PyPy than in CPython, but I don't know how it fares with respect to the other backends. If you are willing to put some numbers in I'd be interested to see them, and potentially change the default order in which backends are loaded to determine the default on PyPy. You can use the benchmark.py utility in the top-level directory to run one of the built-in synthetic scenarios, or against your own JSON files.
  • I think the "python: pure Python parser, good to use with PyPy" phrase is mostly a historical relic, as all backends should be good to use with PyPy really. Based on the benchmark results it could still be that this is the fastest in PyPy though.

from ijson.

jpmckinney avatar jpmckinney commented on August 23, 2024

Thanks!

That jogged my memory a bit – I do something unusual in my code, where I build a dict in which some values are generators. I then use this code when I need to serialize the dict to JSON.

https://github.com/open-contracting/ocdskit/blob/9984b80b524c0a57222f10f76a209bc906c09799/ocdskit/util.py#L40-L68

Somewhere in there, the combination of generators and ijson caused a C error.

Anyway, I'm trying to reproduce it now, but I can't get pip to find YAJL headers when using PyPy (I can use the yajl_c backend in CPython, but I think it's included in the wheel). python -c 'import ijson; print(ijson.backend)' just returns 'python' in my PyPy environment.

from ijson.

rtobar avatar rtobar commented on August 23, 2024

That's interesting about the backends available to you. I just double-checked one the latest I published just the other day for ijson 3.2.0 under https://pypi.org/project/ijson/#files (pypy39, manylinux, x86_64) and it contained both the compiled yajl library and the yajl2_c backends. I also gave it a quick wirl:

$ sudo apt install pypy3-venv
$> pypy3 -mvenv lala
$> source lala/bin/activate
(lala) $ pypy -c 'import ijson; print(ijson.backend)'
yajl2_c

Boom!

And as a tiny benchmark:

(lala) $ cp ~/scm/git/ijson/benchmark.py . # otherwise it uses *that* copy of ijson and doesn't load all backends properly
(lala) $ pypy benchmark.py 
#mbytes,method,test_case,backend,time,mb_per_sec
0.191, basic_parse, long_list, python, 0.036, 5.326
0.191, basic_parse, long_list, yajl2, 0.196, 0.973
0.191, basic_parse, long_list, yajl2_cffi, 0.030, 6.262
0.191, basic_parse, long_list, yajl2_c, 0.062, 3.061
1.886, basic_parse, big_int_object, python, 0.107, 17.704
1.886, basic_parse, big_int_object, yajl2, 0.319, 5.905
1.886, basic_parse, big_int_object, yajl2_cffi, 0.054, 35.115
1.886, basic_parse, big_int_object, yajl2_c, 0.146, 12.930
2.077, basic_parse, big_decimal_object, python, 0.236, 8.783
2.077, basic_parse, big_decimal_object, yajl2, 0.379, 5.475
2.077, basic_parse, big_decimal_object, yajl2_cffi, 0.100, 20.775
2.077, basic_parse, big_decimal_object, yajl2_c, 0.332, 6.248
1.801, basic_parse, big_null_object, python, 0.094, 19.090
1.801, basic_parse, big_null_object, yajl2, 0.273, 6.598
1.801, basic_parse, big_null_object, yajl2_cffi, 0.040, 44.615
1.801, basic_parse, big_null_object, yajl2_c, 0.101, 17.829
1.849, basic_parse, big_bool_object, python, 0.078, 23.842
1.849, basic_parse, big_bool_object, yajl2, 0.288, 6.426
1.849, basic_parse, big_bool_object, yajl2_cffi, 0.044, 42.343
1.849, basic_parse, big_bool_object, yajl2_c, 0.096, 19.163
2.649, basic_parse, big_str_object, python, 0.095, 27.807
2.649, basic_parse, big_str_object, yajl2, 0.353, 7.501
2.649, basic_parse, big_str_object, yajl2_cffi, 0.057, 46.466
2.649, basic_parse, big_str_object, yajl2_c, 0.147, 18.059
8.000, basic_parse, big_longstr_object, python, 0.146, 54.769
8.000, basic_parse, big_longstr_object, yajl2, 0.480, 16.654
8.000, basic_parse, big_longstr_object, yajl2_cffi, 0.057, 141.468
8.000, basic_parse, big_longstr_object, yajl2_c, 0.164, 48.791
19.264, basic_parse, object_with_10_keys, python, 0.764, 25.209
19.264, basic_parse, object_with_10_keys, yajl2, 3.049, 6.318
19.264, basic_parse, object_with_10_keys, yajl2_cffi, 0.461, 41.819
19.264, basic_parse, object_with_10_keys, yajl2_c, 1.902, 10.128
0.381, basic_parse, empty_lists, python, 0.036, 10.482
0.381, basic_parse, empty_lists, yajl2, 0.113, 3.375
0.381, basic_parse, empty_lists, yajl2_cffi, 0.026, 14.803
0.381, basic_parse, empty_lists, yajl2_c, 0.051, 7.532
0.381, basic_parse, empty_objects, python, 0.021, 18.226
0.381, basic_parse, empty_objects, yajl2, 0.282, 1.355
0.381, basic_parse, empty_objects, yajl2_cffi, 0.022, 17.367
0.381, basic_parse, empty_objects, yajl2_c, 0.050, 7.614

So cffi seems to be the winner in this case.

It'd be good to see more evidence that gives these backends a natural sorting order in which we can recommend them under pypy.

For reference, this is the same benchmark with CPython 3.10:

(ijson) $ python benchmark.py 
#mbytes,method,test_case,backend,time,mb_per_sec
0.191, basic_parse, long_list, python, 0.154, 1.235
0.191, basic_parse, long_list, yajl2, 0.091, 2.093
0.191, basic_parse, long_list, yajl2_cffi, 0.089, 2.154
0.191, basic_parse, long_list, yajl2_c, 0.008, 24.960
1.886, basic_parse, big_int_object, python, 0.327, 5.764
1.886, basic_parse, big_int_object, yajl2, 0.177, 10.642
1.886, basic_parse, big_int_object, yajl2_cffi, 0.167, 11.311
1.886, basic_parse, big_int_object, yajl2_c, 0.017, 107.875
2.077, basic_parse, big_decimal_object, python, 0.343, 6.053
2.077, basic_parse, big_decimal_object, yajl2, 0.192, 10.839
2.077, basic_parse, big_decimal_object, yajl2_cffi, 0.177, 11.746
2.077, basic_parse, big_decimal_object, yajl2_c, 0.028, 74.584
1.801, basic_parse, big_null_object, python, 0.270, 6.667
1.801, basic_parse, big_null_object, yajl2, 0.101, 17.869
1.801, basic_parse, big_null_object, yajl2_cffi, 0.111, 16.208
1.801, basic_parse, big_null_object, yajl2_c, 0.014, 131.166
1.849, basic_parse, big_bool_object, python, 0.272, 6.803
1.849, basic_parse, big_bool_object, yajl2, 0.106, 17.429
1.849, basic_parse, big_bool_object, yajl2_cffi, 0.117, 15.738
1.849, basic_parse, big_bool_object, yajl2_c, 0.026, 70.817
2.649, basic_parse, big_str_object, python, 0.312, 8.488
2.649, basic_parse, big_str_object, yajl2, 0.151, 17.525
2.649, basic_parse, big_str_object, yajl2_cffi, 0.142, 18.710
2.649, basic_parse, big_str_object, yajl2_c, 0.016, 163.509
8.000, basic_parse, big_longstr_object, python, 0.323, 24.801
8.000, basic_parse, big_longstr_object, yajl2, 0.153, 52.134
8.000, basic_parse, big_longstr_object, yajl2_cffi, 0.143, 56.138
8.000, basic_parse, big_longstr_object, yajl2_c, 0.016, 510.421
19.264, basic_parse, object_with_10_keys, python, 3.236, 5.954
19.264, basic_parse, object_with_10_keys, yajl2, 1.582, 12.178
19.264, basic_parse, object_with_10_keys, yajl2_cffi, 1.490, 12.932
19.264, basic_parse, object_with_10_keys, yajl2_c, 0.168, 114.446
0.381, basic_parse, empty_lists, python, 0.159, 2.398
0.381, basic_parse, empty_lists, yajl2, 0.041, 9.251
0.381, basic_parse, empty_lists, yajl2_cffi, 0.073, 5.217
0.381, basic_parse, empty_lists, yajl2_c, 0.010, 36.912
0.381, basic_parse, empty_objects, python, 0.160, 2.390
0.381, basic_parse, empty_objects, yajl2, 0.041, 9.342
0.381, basic_parse, empty_objects, yajl2_cffi, 0.073, 5.203
0.381, basic_parse, empty_objects, yajl2_c, 0.010, 36.672

from ijson.

jpmckinney avatar jpmckinney commented on August 23, 2024

Ah, I'm on macos arm64, so that might be the reason – there's no arm 64 wheel for PyPy on macos.

So it looks like on PyPy (on that benchmark): _cffi > python > yajl2 > _c.

That said, yajl_c on CPython seems fastest all around.

from ijson.

rtobar avatar rtobar commented on August 23, 2024

Yes, that seems to be more or less the order. Still I'd hesitate to make a decision based on those alone; if you (or someone else) could provide more real-life numbers it'd be great -- things might be different on a macos arm64 for example.

from ijson.

jpmckinney avatar jpmckinney commented on August 23, 2024

I probably won't be able to, as I can't figure out how to make ijson find YAJL headers on PyPy. Feel free to close the issue.

from ijson.

rtobar avatar rtobar commented on August 23, 2024

OK, thanks for the feedback! I'll close this now, but this issue should be a good reference for future PyPy users.

from ijson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.