Git Product home page Git Product logo

Comments (11)

rtobar avatar rtobar commented on July 21, 2024

@Alexander255 I have been working on a set of changes implementing this, but they are not fully fleshed, so I haven't made them available for testing publicly yet.

The main idea is to decouple I/O from the parsing logic, handling only buffers of data as input, and have these new methods be the building-blocks of the rest of the library. I am doing this using coroutines (as in "generators with (yield) expressions, not python3.5+-specific async coroutines) which expect data to be sent to them, work on it, and then send their results to another couroutine. These coroutines can effectively be chained together, and also allow for user-provided targets. So all in all the current pull pattern gets inverted into a push pattern at the low level. The current set of "pull" generators (basic_parse, parse and items) is then easily built on top of that.

All of this is working fine for all backends with no drop in performance, with the sole exception of the C backend, which I haven't finished porting yet (but it's about 50% done). Apart from that I also have a first version of async for-enabled support functions (basic_parse_async and so on) for all backends which will hopefully make it easier to use ijson with async I/O. Other constructs like the one you suggest can be easily added as well with this new framework.

I expect all these changes will become ijson 3.0, and hopefully t won't take much more longer to finish.

from ijson.

ntninja avatar ntninja commented on July 21, 2024

@rtobar: That sounds great!

(The interface in my example was just an illustration on the general concept to get the general idea across, but what you proposed is much better thought out anyways.)

from ijson.

rtobar avatar rtobar commented on July 21, 2024

@Alexander255 the very first version on this work is finally available. If you could, would you mind trying out the code in the new coros branch? Is should have new *_async methods when using python 3.5+, which receive an asyncio file-like object, and that can be used in an async for construct. I haven't updated the README yet with how to use the new methods, but you can see the tests_asyncio.py file for simple examples.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

New interfaces now exposed on the {{master}} branch. Have a look at the README file to learn more about this. There is a release candidate up in PyPI too (3.0rc1) with these changes, so people can go ahead and test them and provide feedback before a final 3.0 release is done.

from ijson.

ntninja avatar ntninja commented on July 21, 2024

Sorry about only looking at this now. While I haven't used it yet, the offered push interface will definitely work for our needs, so thank you about that!

Some feedback: It looks like the stream reading code of the *_async variants are specific to asyncio's SteamReader interface through (they use its definition of .read() to grab more data).
Unfortunately our library exclusively uses trio which has a slightly different interface so I cannot reuse this code although it would otherwise be a perfect fit for what we're trying to do. As such, could you I convince you to change the line:

data = await self.f.read(self.buf_size)

in utils35.py to this:

if hasattr(self.f, "receive_some"):  # Trio's ReceiveStream
    data = await self.f.receive_some(self.buf_size)
else:  # AsyncIO's StreamReader + Trio's Async Files + CurIO's SocketStream + CurIO's FileStream
    data = await self.f.read(self.buf_size)

I know it's dumb to have exactly one outlier in that list, but it's just the way things are right now… The difference in interface does make some sense when you view it purely from the perspective of what the interface represents within the trio system (some random thing that gives you data) – from the PoV of somebody trying to be ecosystem agnostic it's of course a totally unnecessary complication, but when you (as they did/do) only have trio consumers in mind I think their decision was by itself quite understandable.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

@Alexander255: let me ask the inverse question: would it be possible for you to alter your trio objects so they have a read method that behaves like receive_some?

f = the_trio_file_type()
f.read = f.receive_some
async for object in ijson.items_async(f, ''):
   pass

If it's not possible to add members to trio objects, a thin wrapper would also do.

On the other hand, I'm the first to admit that I'm not very well versed on the asyncio ecosystem, so I'm not sure what's the most popular thing out there. If it turns out there is a considerable amount of users who have receive_some-like objects it would make sense to add the extra support, but for now I think I'll refrain for doing so. As you point out, having one exception is not great if we want to be agnostic to whatever people are using. And adding one exception will also set a precedent for adding more, which again is not great.

from ijson.

ntninja avatar ntninja commented on July 21, 2024

@rtobar: Obviously I can wrapper these up, but it's just not the same to having it just work without any further ado. I don't blame you for that decision though, I'd probably done the same in your stead. 🙂

Let me just request some tiny amendment however: Could you rename the *_async methods to *_asyncio to make it clear which AIO framework they target? Just to avoid surprises, etc.

PS: Regarding AIO frameworks, I think it's very unlikely that we are going to see more of them anytime soon. asyncio is, and will be for a long time, the go-to solution for AIO in Python. trio is the more innovative hidden gem that tries new things. curio has mostly be superseded by trio at this point, save for cases were one has very stringent performance needs (but still use Python for some reason).

from ijson.

rtobar avatar rtobar commented on July 21, 2024

@Alexander255 the new name is a very good suggestion, and easy to get done. I'll do it for the next rc. Thanks for the clarification on frameworks too, one day I might need to delve deeper into that world.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

Mmmm... actually thinking about it again I'm not sure anymore... the async suffix already matches the fact that they have to be used in async def functions, and most usually inside async for statements, so having them end with asyncio would break that. I tried looking for examples of libraries that offered the two flavours of their methods, but a quick search didn't yield relevant results.

from ijson.

ntninja avatar ntninja commented on July 21, 2024

@rtobar: All the AIO frameworks use async def functions – it's the core primitive they all depend on. My suggestion on using the _asyncio suffix is based on the fact that your code does not expect just any async def function/iterable, but rather the specific data stream interface exposed by the asyncio library. That doesn't mean it can't work with other stuff as well, I've brought up plenty of examples above, but that is the API it is targeting and guaranteed to be compatible with.

from ijson.

rtobar avatar rtobar commented on July 21, 2024

@Alexander255 yes, I understand that while there are multiple frameworks/APIs all of them share the same underlying mechanism for declaring coroutines (async def), and that the new methods are designed to work in particular with asyncio-like objects.

The point I was trying to make was different: XYZ_asyncio would mean "this is the version of XYZ that works with asyncio (hence it's asynchronous)", but the meaning I what to convey with the XYZ_async naming is "this is the asynchronous version of XYZ", and not to reflect the type of input they accept. The original "XYZ" names actually don't hint as to what type of object they work with, and even though they work with objects behaving like the classes in the io module they are not called XYZ_io.

On top of that, in both XYZ and XYZ_async cases, I've also been thinking on adding support for accepting generators too (sync and async, respectively), yet another reason for not binding the particular type of input into the names.

from ijson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.