Comments (7)
Title
I concur.
The definition of the column is haversine=pl.col("floats").dist.haversine("floats", "floats", "floats", "floats")
with sample column floats [5.6, -1245.8, 242.224]
.
And the unpacking of the data is:
#[polars_expr(output_type_func=haversine_output)]
fn haversine(inputs: &[Series]) -> PolarsResult<Series> {
let out = match inputs[0].dtype() {
DataType::Float32 => {
let start_lat = inputs[0].f32().unwrap();
let start_long = inputs[1].f32().unwrap();
let end_lat = inputs[2].f32().unwrap();
let end_long = inputs[3].f32().unwrap();
// etc
So from the other examples, we understand that inputs[0]
is the column on which the plugin function is applied, and inputs[1, 2, 3]
are the first 3 column names, and the last one is not used ! If this is correct, this is a bit misleading, isn't it ?
The pig_latinnify
and pig_latinnify_with_parallelism
make sense, work as expected (I cannot judge the parallelism) and are didactic demos, and so is the kwargs example.
But I would argue a a potentially very generic sample function with several columns as input and several columns as ouput would be more helpful, and, in terms of API it would quite readable to have the inputs and outputs as pl.struct
.
Here is the python shell, super generic:
@pl.api.register_expr_namespace("demo")
class Demo:
def __init__(self, expr: pl.Expr):
self._expr = expr
def demo_calc(self) -> pl.Expr:
return self._expr._register_plugin(
lib=lib,
symbol="demo_calc",
is_elementwise=True,
kwargs={},
)
A vaguely worded rust part would be:
// struct input column
struct FuncStructIn {
a_float: bool,
a_int: i64,
a_string: String,
}
// struct output column
struct FuncStructOut {
a_valid: bool,
a_one: f64,
a_two: f64,
}
// output type is a struct with schema FuncStructOut
#[polars_expr(output_type=FuncStructOut)]
fn demo_calc(inputs: &[Series], kwargs: PigLatinKwargs) -> PolarsResult<Series> {
// unpack struct to individual variables, with all the relevant error checking
// obviously this does not work but you get the idea
let s_in: FuncStructIn = inputs[0].struct();
let a_float : &ChunkedArray<T>= = s_in.a_float;
let a_int = s_in.a_int;
let a_string = s_in.a_string;
// impl_demo_calc contains the actuall processing
let s_out: FuncStructOut = impl_demo_calc(a_float, a_int, a_string)
// returns result
// again syntax is super approximative
Ok(s_out.into_series())
}
// approx syntax
pub(super) fn impl_demo_calc(
a_float: &ChunkedArray<Float64Type>,
a_int: &ChunkedArray<Int64Type>,
a_string: &ChunkedArray<Utf8Type>,
) -> PolarsResult<ChunkedArray<FuncStructOut>> {
// can be anything but return a FuncStructOut shaped struct
// unless compute error due to bad input
}
// once the this is done, in a second stage, probably a _with parallelism version can be derived.
Hopefully the intention is clear.
Unfortunately, I don't know enough about polars-rs, and to be honest rust in general, at the moment.
The unpacking/re packing in particular is obscure to me..
If there is no serious drawback to this approach - eg. perf cost creating the input struct - then, once the unpacking/repacking and error management is demonstrated it can become a useful boilerplate to start from for many use cases/people.
Whether maintainers are willing to improve this part of the doc or not, I would be grateful for pointers to continue exploring.
from pyo3-polars.
looks like the dev has been done:
870da1e
I think this solves a use case brought forward by polars extension plugin author Ion Koutsouris in this issue #58.
In the meantime I've learnt some Rust and done some research and existing comunity plugins are the best information sources. There I could find exemples with input as a single column of type list
or struct
, and output as struct
. Aggregating several columns into a col of type struct is easy (and cheap in lazy mode, it seems). Same for unpacking a struct
into several columns.
I am not far from testing it and if confirmed, then my naive question above will be answered.
from pyo3-polars.
Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/
I think you're right about the haversine example though, will make a pr to clear it up
from pyo3-polars.
Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/
YES it does !
I read through it all and implemented along the way - not done the suggested extra work yet.
But it is a very valuable help 🙏
For somebody intermediate in rust, beginner in rust polarrs, proficient in python polars, the following sections, in particular, got my attention:
- what is a series ?
- what is a chunk ?
- cumulative sum
- branch mid predictions
- list columns as input
- struct columns as input/ouput
from pyo3-polars.
looks like the dev has been done:
870da1e
But I'm still not sure how we can pass multiple columns to a function.
from pyo3-polars.
Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/
I think you're right about the haversine example though, will make a pr to clear it up
yes thanks a lot.
One more thing that is not clear. it looks like "pig_latinnify_with_paralellism" is not used anywhere, just defined. When do we need to use a library like Rayon to speed things up as one of the advantage of using shared library plugins is to have "Parallelism and optimizations (are) managed by the default polars runtime"?
from pyo3-polars.
it looks like "pig_latinnify_with_paralellism" is not used anywhere, just defined
I have the same question :)
from pyo3-polars.
Related Issues (20)
- Minimal plugin may segfault HOT 2
- Sorted flag not preserved when returning the PyDataFrame to Rust
- Allow to set expression output type based on kwargs HOT 2
- cannot find -lpython3.10: No such file or directory HOT 1
- Kernel panics when passing list(categorical) as input
- Support for Using Polars Extension with `over` Syntax HOT 1
- How to pass kwarg to func output type HOT 4
- plugins' names not respected? HOT 3
- Namespace API for DataFrame/LazyFrame
- Polars 0.26 rc1: plugins panic when passing String input HOT 2
- `ArrowInvalid` when return a `PyDataFrame` to python HOT 2
- LazyFrame::anonymous_scan can't be send to python HOT 1
- Pluggin list
- The first call to pyfunction returning PyDataFrame is slow HOT 3
- Please add examples of passing dataframes between python and rust HOT 1
- Allow for custom datasources as plugins HOT 5
- Template repository for pyo3-polars
- Upgrade to PyO3 0.21.1 HOT 2
- Maintain input series names, in rust, when a plugin is called within .over() context HOT 4
- Upgrade pyo3-polars and friends to PyO3 0.21 and its new Bound<> APIs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyo3-polars.