Git Product home page Git Product logo

Comments (7)

oscar6echo avatar oscar6echo commented on July 25, 2024 1

Title

I concur.

The definition of the column is haversine=pl.col("floats").dist.haversine("floats", "floats", "floats", "floats") with sample column floats [5.6, -1245.8, 242.224].
And the unpacking of the data is:

#[polars_expr(output_type_func=haversine_output)]
fn haversine(inputs: &[Series]) -> PolarsResult<Series> {
    let out = match inputs[0].dtype() {
        DataType::Float32 => {
            let start_lat = inputs[0].f32().unwrap();
            let start_long = inputs[1].f32().unwrap();
            let end_lat = inputs[2].f32().unwrap();
            let end_long = inputs[3].f32().unwrap();
// etc

So from the other examples, we understand that inputs[0] is the column on which the plugin function is applied, and inputs[1, 2, 3] are the first 3 column names, and the last one is not used ! If this is correct, this is a bit misleading, isn't it ?

The pig_latinnify and pig_latinnify_with_parallelism make sense, work as expected (I cannot judge the parallelism) and are didactic demos, and so is the kwargs example.

But I would argue a a potentially very generic sample function with several columns as input and several columns as ouput would be more helpful, and, in terms of API it would quite readable to have the inputs and outputs as pl.struct.

Here is the python shell, super generic:

@pl.api.register_expr_namespace("demo")
class Demo:
    def __init__(self, expr: pl.Expr):
        self._expr = expr

    def demo_calc(self) -> pl.Expr:
        return self._expr._register_plugin(
            lib=lib,
            symbol="demo_calc",
            is_elementwise=True,
            kwargs={},
        )

A vaguely worded rust part would be:

// struct input column
struct FuncStructIn {
    a_float: bool,
    a_int: i64,
    a_string: String,
}

// struct output column
struct FuncStructOut {
    a_valid: bool,
    a_one: f64,
    a_two: f64,
}

// output type is a struct with schema FuncStructOut
#[polars_expr(output_type=FuncStructOut)]
fn demo_calc(inputs: &[Series], kwargs: PigLatinKwargs) -> PolarsResult<Series> {
    
    // unpack struct to individual variables, with all the relevant error checking
    // obviously this does not work but you get the idea
    let s_in: FuncStructIn = inputs[0].struct();

    let a_float : &ChunkedArray<T>= =    s_in.a_float;
    let a_int = s_in.a_int;
    let a_string = s_in.a_string;

    // impl_demo_calc contains the actuall processing
    let s_out: FuncStructOut = impl_demo_calc(a_float, a_int, a_string)

    // returns result
    // again syntax is super approximative
    Ok(s_out.into_series())
}

// approx syntax
pub(super) fn impl_demo_calc(
    a_float: &ChunkedArray<Float64Type>,
    a_int: &ChunkedArray<Int64Type>,
    a_string: &ChunkedArray<Utf8Type>,
) -> PolarsResult<ChunkedArray<FuncStructOut>> {
    // can be anything but return a FuncStructOut shaped struct
    // unless compute error due to bad input
}

// once the this is done, in a second stage, probably a _with parallelism version can be derived.

Hopefully the intention is clear.

Unfortunately, I don't know enough about polars-rs, and to be honest rust in general, at the moment.
The unpacking/re packing in particular is obscure to me..

If there is no serious drawback to this approach - eg. perf cost creating the input struct - then, once the unpacking/repacking and error management is demonstrated it can become a useful boilerplate to start from for many use cases/people.

Whether maintainers are willing to improve this part of the doc or not, I would be grateful for pointers to continue exploring.

from pyo3-polars.

oscar6echo avatar oscar6echo commented on July 25, 2024 1

looks like the dev has been done:
870da1e

I think this solves a use case brought forward by polars extension plugin author Ion Koutsouris in this issue #58.

In the meantime I've learnt some Rust and done some research and existing comunity plugins are the best information sources. There I could find exemples with input as a single column of type list or struct, and output as struct. Aggregating several columns into a col of type struct is easy (and cheap in lazy mode, it seems). Same for unpacking a struct into several columns.

I am not far from testing it and if confirmed, then my naive question above will be answered.

from pyo3-polars.

MarcoGorelli avatar MarcoGorelli commented on July 25, 2024 1

Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/

I think you're right about the haversine example though, will make a pr to clear it up

from pyo3-polars.

oscar6echo avatar oscar6echo commented on July 25, 2024 1

Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/

YES it does !
I read through it all and implemented along the way - not done the suggested extra work yet.
But it is a very valuable help 🙏

For somebody intermediate in rust, beginner in rust polarrs, proficient in python polars, the following sections, in particular, got my attention:

from pyo3-polars.

Macfly avatar Macfly commented on July 25, 2024

looks like the dev has been done:
870da1e

But I'm still not sure how we can pass multiple columns to a function.

from pyo3-polars.

Macfly avatar Macfly commented on July 25, 2024

Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/

I think you're right about the haversine example though, will make a pr to clear it up

yes thanks a lot.
One more thing that is not clear. it looks like "pig_latinnify_with_paralellism" is not used anywhere, just defined. When do we need to use a library like Rayon to speed things up as one of the advantage of using shared library plugins is to have "Parallelism and optimizations (are) managed by the default polars runtime"?

from pyo3-polars.

MarcoGorelli avatar MarcoGorelli commented on July 25, 2024

it looks like "pig_latinnify_with_paralellism" is not used anywhere, just defined

I have the same question :)

from pyo3-polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.