Comments (11)
@Alexander255 I have been working on a set of changes implementing this, but they are not fully fleshed, so I haven't made them available for testing publicly yet.
The main idea is to decouple I/O from the parsing logic, handling only buffers of data as input, and have these new methods be the building-blocks of the rest of the library. I am doing this using coroutines (as in "generators with (yield)
expressions, not python3.5+-specific async
coroutines) which expect data to be sent to them, work on it, and then send their results to another couroutine. These coroutines can effectively be chained together, and also allow for user-provided targets. So all in all the current pull pattern gets inverted into a push pattern at the low level. The current set of "pull" generators (basic_parse
, parse
and items
) is then easily built on top of that.
All of this is working fine for all backends with no drop in performance, with the sole exception of the C backend, which I haven't finished porting yet (but it's about 50% done). Apart from that I also have a first version of async for
-enabled support functions (basic_parse_async
and so on) for all backends which will hopefully make it easier to use ijson with async I/O. Other constructs like the one you suggest can be easily added as well with this new framework.
I expect all these changes will become ijson 3.0, and hopefully t won't take much more longer to finish.
from ijson.
@rtobar: That sounds great!
(The interface in my example was just an illustration on the general concept to get the general idea across, but what you proposed is much better thought out anyways.)
from ijson.
@Alexander255 the very first version on this work is finally available. If you could, would you mind trying out the code in the new coros
branch? Is should have new *_async
methods when using python 3.5+, which receive an asyncio file-like object, and that can be used in an async for
construct. I haven't updated the README yet with how to use the new methods, but you can see the tests_asyncio.py
file for simple examples.
from ijson.
New interfaces now exposed on the {{master}} branch. Have a look at the README file to learn more about this. There is a release candidate up in PyPI too (3.0rc1) with these changes, so people can go ahead and test them and provide feedback before a final 3.0 release is done.
from ijson.
Sorry about only looking at this now. While I haven't used it yet, the offered push interface will definitely work for our needs, so thank you about that!
Some feedback: It looks like the stream reading code of the *_async
variants are specific to asyncio
's SteamReader interface through (they use its definition of .read()
to grab more data).
Unfortunately our library exclusively uses trio
which has a slightly different interface so I cannot reuse this code although it would otherwise be a perfect fit for what we're trying to do. As such, could you I convince you to change the line:
data = await self.f.read(self.buf_size)
in utils35.py to this:
if hasattr(self.f, "receive_some"): # Trio's ReceiveStream
data = await self.f.receive_some(self.buf_size)
else: # AsyncIO's StreamReader + Trio's Async Files + CurIO's SocketStream + CurIO's FileStream
data = await self.f.read(self.buf_size)
I know it's dumb to have exactly one outlier in that list, but it's just the way things are right now… The difference in interface does make some sense when you view it purely from the perspective of what the interface represents within the trio system (some random thing that gives you data) – from the PoV of somebody trying to be ecosystem agnostic it's of course a totally unnecessary complication, but when you (as they did/do) only have trio consumers in mind I think their decision was by itself quite understandable.
from ijson.
@Alexander255: let me ask the inverse question: would it be possible for you to alter your trio objects so they have a read
method that behaves like receive_some
?
f = the_trio_file_type()
f.read = f.receive_some
async for object in ijson.items_async(f, ''):
pass
If it's not possible to add members to trio objects, a thin wrapper would also do.
On the other hand, I'm the first to admit that I'm not very well versed on the asyncio
ecosystem, so I'm not sure what's the most popular thing out there. If it turns out there is a considerable amount of users who have receive_some
-like objects it would make sense to add the extra support, but for now I think I'll refrain for doing so. As you point out, having one exception is not great if we want to be agnostic to whatever people are using. And adding one exception will also set a precedent for adding more, which again is not great.
from ijson.
@rtobar: Obviously I can wrapper these up, but it's just not the same to having it just work without any further ado. I don't blame you for that decision though, I'd probably done the same in your stead.
Let me just request some tiny amendment however: Could you rename the *_async
methods to *_asyncio
to make it clear which AIO framework they target? Just to avoid surprises, etc.
PS: Regarding AIO frameworks, I think it's very unlikely that we are going to see more of them anytime soon. asyncio
is, and will be for a long time, the go-to solution for AIO in Python. trio
is the more innovative hidden gem that tries new things. curio
has mostly be superseded by trio
at this point, save for cases were one has very stringent performance needs (but still use Python for some reason).
from ijson.
@Alexander255 the new name is a very good suggestion, and easy to get done. I'll do it for the next rc. Thanks for the clarification on frameworks too, one day I might need to delve deeper into that world.
from ijson.
Mmmm... actually thinking about it again I'm not sure anymore... the async
suffix already matches the fact that they have to be used in async def
functions, and most usually inside async for
statements, so having them end with asyncio
would break that. I tried looking for examples of libraries that offered the two flavours of their methods, but a quick search didn't yield relevant results.
from ijson.
@rtobar: All the AIO frameworks use async def
functions – it's the core primitive they all depend on. My suggestion on using the _asyncio
suffix is based on the fact that your code does not expect just any async def
function/iterable, but rather the specific data stream interface exposed by the asyncio
library. That doesn't mean it can't work with other stuff as well, I've brought up plenty of examples above, but that is the API it is targeting and guaranteed to be compatible with.
from ijson.
@Alexander255 yes, I understand that while there are multiple frameworks/APIs all of them share the same underlying mechanism for declaring coroutines (async def
), and that the new methods are designed to work in particular with asyncio
-like objects.
The point I was trying to make was different: XYZ_asyncio
would mean "this is the version of XYZ that works with asyncio (hence it's asynchronous)", but the meaning I what to convey with the XYZ_async
naming is "this is the asynchronous version of XYZ", and not to reflect the type of input they accept. The original "XYZ" names actually don't hint as to what type of object they work with, and even though they work with objects behaving like the classes in the io
module they are not called XYZ_io
.
On top of that, in both XYZ
and XYZ_async
cases, I've also been thinking on adding support for accepting generators too (sync and async, respectively), yet another reason for not binding the particular type of input into the names.
from ijson.
Related Issues (20)
- yajl2_c backend crashes on PyPy3 HOT 19
- Use stacklevel to point str vs bytes warning to user code HOT 3
- Release wheels for 3.11 HOT 6
- How to determine which backend is being used at runtime? HOT 3
- Is the yajl_c backend supported on PyPy? HOT 7
- High level interface to iterate over lists HOT 3
- HighLevelAPI: Raise an error if the prefix does not exist HOT 2
- Is it possible to use multiple prefix HOT 8
- yajl2_c backend for lambda function HOT 2
- How to use ijson to covert string to dict? HOT 3
- How to read json records in chunks using ijson? HOT 4
- Question: is it possible that returing bytes instead of str could speedup parsing? HOT 3
- Thread safety HOT 9
- Full support for byte stream generator HOT 9
- Allow to use ijson package by a relative import HOT 4
- How can I most-efficiently check for a key in the top-level of a json object? HOT 3
- Python3.12 compilation error: ‘PyGenObject’ has no member named ‘gi_code’ HOT 5
- Is it possible to use isjon with Jsonl, ndjson ? HOT 5
- Memory leak on exception handling with yajl2_c backend HOT 6
- _yajl2 backend broken with Python 3.12 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ijson.