Comments (3)
Hey Raphael! So, filter_chunked
is certainly adjacent to some things I've done in the past. Just at first blush, I wondered how this might be rewritten as a composition of chunked_iter
, zip
, and filter
. And it turned out to be pretty annoying!
(something like:
from boltons.iterutils import chunked_iter
key = lambda chunk: [bool(x%2) for x in chunk]
vals = list(range(20))
chunk_size = 5
filtered_chunked = (x for chunk in chunked_iter(vals, chunk_size) for x, allow in zip(chunk, key(chunk)) if allow)
# list(filtered_chunked) -> [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
If it's this annoying, I'm generally for including it. What do you think of the simplified expression above? You'll definitely want to use chunked_iter
if you're going to have make a generator. We may need to name it filter_chunked_iter
, too.
The input iterable is flat, and the output iterable is flat, so that dimensionality matches, and there's no need for a value_transform
convenience like with bucketize
.
We'll also need a nice, demonstrative docstring. What's your motivating example in this case? Are there a lot of APIs where this batch nature applies?
Thanks again!
from boltons.
What do you think of the simplified expression above?
I'm happy with a generator expression, I'd just split it in lines and use a nicer variable name for my PR.
I wouldn't use key
as the function-param name, bc it suggests it returns some inherent property of the object (like in sorted
). I did use just func
above but I think something batch_pred
or chunk_pred
(predicate) would suit.
What's your motivating example in this case?
My code is looking for resources that exist in one API but not in another one. Both APIs use paging, have limits on query length, and allow for property in $values
semantics.
The code goes over an iterator (generator) of results of the first API (paged, but that's hidden in the generator), takes the objects' ids and searches them in the second API. Whatever is returned can be discarded from the input as processed in previous runs. Searching step is wrapped in a batch predicate and used in chunked_filter.
One alternative would collecting all the objects in multiple fetches, but that would only increase memory footprint, and still the list would need to be chunked for the second API.
Another alternative would be fetching each resource in a separate query, but that would add time overhead from IO and API rate limits.
Are there a lot of APIs where this batch nature applies?
Not sure how to answer your question.
One notable example would be SQL. My project is using AirTable, which is a database wannabe with REST API and a query (formula) language.
from boltons.
We'll also need a nice, demonstrative docstring.
Something like
https://thedailywtf.com/articles/Very,_Very_Well_Documented
from boltons.
Related Issues (20)
- attrs/dataclasses-style decorator helper
- Mfnvhudhvsdmbc HOT 1
- Test failure with Python 3.11 HOT 1
- Tag for the 23.0.0 release is missing HOT 1
- Include tests in future pypi sdist tarball HOT 5
- Convert list of dict items to list of string items
- `ParsedException.from_string(text).to_string() == text` property violated due to anchors
- `boltons.ecoutils` `23.0.0` breaks `pdb` interactive prompt in `pytest` debug sessions HOT 4
- Non-empty `dictutils.OMD` cannot be loaded from `pickle` HOT 1
- RFC: Make boltons Python 3.7+ only. HOT 8
- Support in-place union for `dictutils.OrderedMultiDict` HOT 1
- [Feature request] Parametrize the delimiter to make glom use any kind of Path delimiter, not just `.` HOT 1
- wraps loses keywords
- tracking some ideas HOT 3
- LRU .values() and dict return old entries HOT 3
- call _orig_default identity
- Names in `boltons.strutils.__all__` with no definitions
- Missing git tags for 23.1.0 & 23.1.1 releases HOT 1
- iterutils.get_path has undocumented path as string parameter HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from boltons.