Comments (10)
@hantusk this looks very cool and potentially a good workaround. Thanks for sharing!
from pyo3-polars.
This is something I want to get into to. But it need to be more than a trait as we want to get over FFI. On the rust side there is already AnymousSource
. This will be extended to support the new streaming engine.
from pyo3-polars.
Wow, you are quick. I am still working on the example. :D
from pyo3-polars.
For my thesis I am currently looking at how I can hook an existing backend query service into Polars to use the Lazy DataFrame API. This however would need to be passed from the Rust side to the Python side as the use-case is aimed at Data Scientists / ML Engineers working in Python. From what I gathered it unfortunately seems to be impossible to do so right now, so I want to +1 this issue as this would in general open up a lot of possibilities for the Polars eco system!
from pyo3-polars.
Can anyone suggest how to work around this limitation? That is, how can I "extend polars" to support scanning my custom file formats?
I looked at https://github.com/universalmind303/polars-mongo which seems clean and straight-forward, but suffers from the same limitation as in #67.
from pyo3-polars.
You might be able to scan your custom file formats using fsspec. Here's an example: https://csvbase.com/blog/7.
from pyo3-polars.
Hi @ritchie46, I've been using the newly released IO plugins and it works well, thank you.
I have a question regarding n_rows
. In the docstring it says:
n_rows: Materialize only n rows from the source. The reader can stop when
n_rows
are read.
Is it before or after the predicate is applied? In this context, what's the meaning of "materialize"?
Thanks again for implementing this!
from pyo3-polars.
Here is the working example; https://github.com/pola-rs/pyo3-polars/tree/main/example/io_plugin
from pyo3-polars.
@ritchie46 thank you.
I understand from this that n_rows
can be used regardless of predicate. I have another question. Can I modify n_rows to account for batch sizes? e.g.:
def _read_my_format_impl(path: str, ...) -> pl.DataFrame: ...
def scan_my_format(paths, ...) -> pl.LazyFrame:
def _read_my_format(with_columns, predicate, n_rows, batch_size):
for path in paths:
df = _read_my_format_impl(path, columns=with_columns, n_rows=n_rows)
if predicate is not None:
df = df.filter(predicate)
yield df
if n_rows is not None:
n_rows -= df.height # <-- is this legit?
if n_rows <= 0:
break
return register_io_source(callable=_read_my_format, schema=...)
from pyo3-polars.
Maybe. You are not allowed to return more than n_rows
. It is the upper limit.
from pyo3-polars.
Related Issues (20)
- libc::write argument type error HOT 4
- cannot find -lpython3.10: No such file or directory HOT 1
- Kernel panics when passing list(categorical) as input
- Support for Using Polars Extension with `over` Syntax HOT 1
- How to pass kwarg to func output type HOT 4
- plugins' names not respected? HOT 3
- Namespace API for DataFrame/LazyFrame
- Polars 0.26 rc1: plugins panic when passing String input HOT 2
- `ArrowInvalid` when return a `PyDataFrame` to python HOT 2
- LazyFrame::anonymous_scan can't be send to python HOT 1
- Pluggin list
- The first call to pyfunction returning PyDataFrame is slow HOT 3
- Please add examples of passing dataframes between python and rust HOT 1
- Template repository for pyo3-polars
- Upgrade to PyO3 0.21.1 HOT 2
- Maintain input series names, in rust, when a plugin is called within .over() context HOT 4
- Upgrade pyo3-polars and friends to PyO3 0.21 and its new Bound<> APIs
- Support pyo3-0.22 HOT 1
- Compilation error when performant feature is enabled for polars HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyo3-polars.