Git Product home page Git Product logo

sql-udf's People

Contributors

tgross35 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sql-udf's Issues

Change how arguments are sent to the traits

Instead of providing a slice from a vec, I think we should:

  • Use the InitId type as the main type (probably typedef from the C version)
  • Add an iter_args() method that returns an SqlArg
  • These types should have methods that grant safe access and modification where needed

Refactor wrapper/macro side

This could really use some adjustment. We have some tricky datatypes to represent:

  • i64 vs. f64 vs. AsRef<[u8]>
  • Option vs not option
  • Reference vs owned

This is currently quite messy, these modules will need some cleanup

MariaDB does not seem to allocate anything for the length parameter in nested function calls

Running the following:

CREATE FUNCTION uuid_generate_v4 RETURNS string SONAME 'libudf_uuid
CREATE FUNCTION uuid_is_valid RETURNS integer SONAME 'libudf_uuid.so';

select uuid_is_valid(uuid_generate_v4());

Has a weird issue. From my note on zulip:

Trevor Gross: Hitting something sort of weird. For string result UDFs when setting initid->max_length in x_init, usually it provides an allocated buffer of that length to the x() call. Which is the expected behavior for lengths <= 255 (per the mysql docs) and seems to work correctly

Trevor Gross: However, when wrapping that call inside another call on the SQL side ( Y(X())), it seems like the server says the length of the buffer is 0

Trevor Gross: So for some reason there's a difference in UDF calling behavior when directly calling a function vs. when calling it within another function. Not sure why that would be, I'll try to look into it

Link to example: pluots/udf-suite@c89b47d

Rewrite `SqlArg` to contain only a `&SqlArgList` and an index

This will be a breaking change unfortunately, but for the better. SqlArg currently has a value and an attribute field that are public.

Concept: SqlArg should just take a reference to SqlArgList and have an index. It should then provide value() and attribute() functions to get those values dynamically. That saves the user from paying for a few copies where not needed, and shrinks the SqlArg size

Add docker test for examples

Recommended by Daniel from MariaDB:

FROM rust:latest AS build
COPY . /build

ENV CARGO_HOME=/build/.docker-cargo

ARG PROJECT=myrand

WORKDIR /build/$PROJECT

RUN cargo build --release

FROM mariadb:10.8

ARG PROJECT

COPY --from=build /build/$PROJECT/target/release/*.so /usr/lib/mysql/plugin/

Hi. Can you help me with `Option<&'a [u8]>`

I need to return really long CLOB. I cannot make it work if it is optional.

#[derive(Default)]
struct Test {
    vec: Vec<u8>
}
#[register]
impl BasicUdf for Test {
    type Returns<'a> = Option<&'a [u8]>;

    fn init(_cfg: &UdfCfg<Init>, _args: &ArgList<Init>) -> Result<Self, String> {
        Ok(Self::default())
    }

    fn process<'a>(&'a mut self, _cfg: &UdfCfg<Process>, _args: &ArgList<Process>, _error: Option<NonZeroU8>) -> Result<Self::Returns<'a>, ProcessError> {
        self.vec = (0..255).collect();
        Ok(Some(&self.vec[..]))
    }
}

I got an error

error[E0308]: mismatched types
  --> src/test.rs:88:1
   |
88 | #[register]
   | ^^^^^^^^^^^ one type is more general than the other
   |
   = note: expected enum `Option<&'a [u8]>`
              found enum `Option<&[u8]>`
   = note: this error originates in the attribute macro `register` (in Nightly builds, run with -Z macro-backtrace for more info)

If &'a [u8] is not optional, it compiles just fine.

Miri issues on transmute at `set_type_coercion`

We are hitting this miri error:

test sequence::tests::test_init ... error: Undefined Behavior: constructing invalid value at .<enum-tag>: encountered 0xaa000202, but expected a valid enum tag
   --> /Users/tmgross/Documents/Projects/pluots-text/sql-udf/udf/src/types/arg.rs:110:24
    |
110 |             *arg_ptr = mem::transmute(set_coercion(*arg_ptr as i32, newtype as i32));
    |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ constructing invalid value at .<enum-tag>: encountered 0xaa000202, but expected a valid enum tag
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside `udf::SqlArg::<'_, udf::Init>::set_type_coercion` at /Users/tmgross/Documents/Projects/pluots-text/sql-udf/udf/src/types/arg.rs:110:24: 110:85

And it is right, we are constructing something "bad" here.

The correct way to express intent is to change the type in the struct.

Potential UB with setting type coercion > once

Setting type coercion then recreating an argument would be UB

Not sure what the best way to do this is. Maybe have one of the wrapper types store a vector of coercions, then flush them at the end

`#[udf]` macro for simple functions

It would be nice to provide a simplified interface for the most common needs:

#[udf]
fn my_udf(a: Option<&str>, b: f64, c: &str) -> Option<i32> {
}

#[udf]
fn my_udf2(cache: &mut Vec<u8>, a: Option<&str>, b: f64, c: &str) -> Result<&[u8], String> {
}

// If init is not specified, default to Default
#[udf(init = MyStruct::new())]
fn my_udf3(cache: &mut MyStruct, a: Option<&str>, b: f64, c: &str) -> &[u8] {
}

// desugars to something like
mod my_udf_mod {
    struct MyUdf(Vec<u8>);

    #[register]
    impl BasicUdf for MyUdf {
        returning = Option<i32>;
        fn init(...) { /* autogenerated, verify types */ }
        fn process(&mut self...) {
            // generate user defined types
            let arg1 = args.get(0).as_str();
            my_udf(self.0, arg1, arg2, arg3)
        }
    }
    
    fn my_udf(a: Option<&str>, b: f64, c: &str) -> Option<i32> {
    }
}

The work would be entirely in the proc macro and would just desugar to our current traits. Some way to define >1 function signature would also be cool, but I don't quite know how to combine them together without having them in the same block

UnsafeCell Usage

Our wrapper structs should probably use struct SafeStruct(UnsafeCell<c_struct>) which instructs the compiler that it may have interior mutability

Add Mock Generators

We probably want a mock for each possible argument. These include:

  • UdfCfg<Init>
  • UdfCfg<Process>
  • ArgList<Init>
  • ArgList<Process>

We probably want to have these implemented as something like

pub struct MockUdfCfg {
    // add fields here
}

impl MockUdfCfg {
    pub fn as_init(&mut self) -> &UdfCfg<Init>;
    pub fn as_process(&mut self) -> &UdfCfg<Process>;
}

pub struct MockArgList{}

Store owned buffers in the Box instead of having the user manage it

We'll do this for rev 0.5

Instead of having the user make something like

struct MyUdf {
    ret: String
}

Let's just use something like this:

struct OwnedBuffer<U, R> {
    data: U,
    ret: R
}

That we store to the Box. This will mitigate #39 and also mean we don't need to use GATs anymore, greatly simplifying the function signatures.

Proc macro layout

I have come to the conclusion that a user must put #[register] on both impl blocks

The register on BasicUdf will:

  • Make init, deinit, and process C functions
  • Add impl UdfRegistered for StructName

And for the register on AggUdf:

Going to worry about things like renaming the function later

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.