Git Product home page Git Product logo

arrow-udf's Introduction

Arrow User-Defined Functions Framework

Easily create and run user-defined functions (UDF) on Apache Arrow. You can define functions in Rust, Python, Java or JavaScript. The functions can be executed natively, or in WebAssembly, or in a remote server.

Language Native WebAssembly Remote
Rust arrow-udf arrow-udf-wasm
Python arrow-udf-python arrow-udf-flight/python
JavaScript arrow-udf-js or arrow-udf-js-deno
Java arrow-udf-flight/java

Extension Types

In addition to the standard types defined by Arrow, these crates also support the following data types through Arrow's extension type. When using extension types, you need to add the ARROW:extension:name key to the field's metadata.

Extension Type Physical Type ARROW:extension:name
JSON Utf8, Binary, LargeBinary arrowudf.json
Decimal Utf8 arrowudf.decimal

Alternatively, you can configure the extension metadata key and values to look for when converting between Arrow and extension types:

let mut js_runtime = arrow_udf_js::Runtime::new().unwrap();
let converter = js_runtime.converter_mut();
converter.set_arrow_extension_key("Extension");
converter.set_json_extension_name("Variant");
converter.set_decimal_extension_name("Decimal");

JSON Type

JSON type is stored in string array in text form.

let json_field = Field::new(name, DataType::Utf8, true)
    .with_metadata([("ARROW:extension:name".into(), "arrowudf.json".into())].into());
let json_array = StringArray::from(vec![r#"{"key": "value"}"#]);

Decimal Type

Different from the fixed-point decimal type built into Arrow, this decimal type represents floating-point numbers with arbitrary precision or scale, that is, the unconstrained numeric in Postgres. The decimal type is stored in a string array in text form.

let decimal_field = Field::new(name, DataType::Utf8, true)
    .with_metadata([("ARROW:extension:name".into(), "arrowudf.decimal".into())].into());
let decimal_array = StringArray::from(vec!["0.0001", "-1.23", "0"]);

Benchmarks

We have benchmarked the performance of function calls in different environments. You can run the benchmarks with the following command:

cargo bench --bench wasm

Performance comparison of calling gcd on a chunk of 1024 rows:

gcd/native          1.5237 µs   x1
gcd/wasm            15.547 µs   x10
gcd/js(quickjs)     85.007 µs   x55
gcd/js(deno)        93.584 µs   x62
gcd/python          175.29 µs   x115

Who is using this library?

  • RisingWave: A Distributed SQL Database for Stream Processing.
  • Databend: An open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake.

arrow-udf's People

Contributors

wangrunji0408 avatar bakjos avatar xuanwo avatar psiace avatar sundy-li avatar maxjustus avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.