pluots / sql-udf Goto Github PK
View Code? Open in Web Editor NEWA wrapper for writing MariaDB/MySQL user defined functions in Rust
License: Other
A wrapper for writing MariaDB/MySQL user defined functions in Rust
License: Other
Instead of providing a slice from a vec, I think we should:
iter_args()
method that returns an SqlArg
This could really use some adjustment. We have some tricky datatypes to represent:
This is currently quite messy, these modules will need some cleanup
Running the following:
CREATE FUNCTION uuid_generate_v4 RETURNS string SONAME 'libudf_uuid
CREATE FUNCTION uuid_is_valid RETURNS integer SONAME 'libudf_uuid.so';
select uuid_is_valid(uuid_generate_v4());
Has a weird issue. From my note on zulip:
Trevor Gross: Hitting something sort of weird. For string result UDFs when setting initid->max_length in x_init, usually it provides an allocated buffer of that length to the x() call. Which is the expected behavior for lengths <= 255 (per the mysql docs) and seems to work correctly
Trevor Gross: However, when wrapping that call inside another call on the SQL side ( Y(X())), it seems like the server says the length of the buffer is 0
Trevor Gross: So for some reason there's a difference in UDF calling behavior when directly calling a function vs. when calling it within another function. Not sure why that would be, I'll try to look into it
Link to example: pluots/udf-suite@c89b47d
Panicking into a FFI boundary is UB, so we need to catch any panics at the boundary.
This will be a breaking change unfortunately, but for the better. SqlArg
currently has a value
and an attribute
field that are public.
Concept: SqlArg
should just take a reference to SqlArgList
and have an index. It should then provide value()
and attribute()
functions to get those values dynamically. That saves the user from paying for a few copies where not needed, and shrinks the SqlArg
size
Recommended by Daniel from MariaDB:
FROM rust:latest AS build
COPY . /build
ENV CARGO_HOME=/build/.docker-cargo
ARG PROJECT=myrand
WORKDIR /build/$PROJECT
RUN cargo build --release
FROM mariadb:10.8
ARG PROJECT
COPY --from=build /build/$PROJECT/target/release/*.so /usr/lib/mysql/plugin/
The Udf
part is just kind of redundant
I need to return really long CLOB. I cannot make it work if it is optional.
#[derive(Default)]
struct Test {
vec: Vec<u8>
}
#[register]
impl BasicUdf for Test {
type Returns<'a> = Option<&'a [u8]>;
fn init(_cfg: &UdfCfg<Init>, _args: &ArgList<Init>) -> Result<Self, String> {
Ok(Self::default())
}
fn process<'a>(&'a mut self, _cfg: &UdfCfg<Process>, _args: &ArgList<Process>, _error: Option<NonZeroU8>) -> Result<Self::Returns<'a>, ProcessError> {
self.vec = (0..255).collect();
Ok(Some(&self.vec[..]))
}
}
I got an error
error[E0308]: mismatched types
--> src/test.rs:88:1
|
88 | #[register]
| ^^^^^^^^^^^ one type is more general than the other
|
= note: expected enum `Option<&'a [u8]>`
found enum `Option<&[u8]>`
= note: this error originates in the attribute macro `register` (in Nightly builds, run with -Z macro-backtrace for more info)
If &'a [u8]
is not optional, it compiles just fine.
This will be a helper in our wrapper
We are hitting this miri error:
test sequence::tests::test_init ... error: Undefined Behavior: constructing invalid value at .<enum-tag>: encountered 0xaa000202, but expected a valid enum tag
--> /Users/tmgross/Documents/Projects/pluots-text/sql-udf/udf/src/types/arg.rs:110:24
|
110 | *arg_ptr = mem::transmute(set_coercion(*arg_ptr as i32, newtype as i32));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ constructing invalid value at .<enum-tag>: encountered 0xaa000202, but expected a valid enum tag
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `udf::SqlArg::<'_, udf::Init>::set_type_coercion` at /Users/tmgross/Documents/Projects/pluots-text/sql-udf/udf/src/types/arg.rs:110:24: 110:85
And it is right, we are constructing something "bad" here.
The correct way to express intent is to change the type in the struct.
Anywhere that currently returns a String from the user should accept AsRef<[u8]>
to allow use of String
or `'&static str``
Setting type coercion then recreating an argument would be UB
Not sure what the best way to do this is. Maybe have one of the wrapper types store a vector of coercions, then flush them at the end
It would be nice to provide a simplified interface for the most common needs:
#[udf]
fn my_udf(a: Option<&str>, b: f64, c: &str) -> Option<i32> {
}
#[udf]
fn my_udf2(cache: &mut Vec<u8>, a: Option<&str>, b: f64, c: &str) -> Result<&[u8], String> {
}
// If init is not specified, default to Default
#[udf(init = MyStruct::new())]
fn my_udf3(cache: &mut MyStruct, a: Option<&str>, b: f64, c: &str) -> &[u8] {
}
// desugars to something like
mod my_udf_mod {
struct MyUdf(Vec<u8>);
#[register]
impl BasicUdf for MyUdf {
returning = Option<i32>;
fn init(...) { /* autogenerated, verify types */ }
fn process(&mut self...) {
// generate user defined types
let arg1 = args.get(0).as_str();
my_udf(self.0, arg1, arg2, arg3)
}
}
fn my_udf(a: Option<&str>, b: f64, c: &str) -> Option<i32> {
}
}
The work would be entirely in the proc macro and would just desugar to our current traits. Some way to define >1 function signature would also be cool, but I don't quite know how to combine them together without having them in the same block
Our wrapper structs should probably use struct SafeStruct(UnsafeCell<c_struct>)
which instructs the compiler that it may have interior mutability
We probably want a mock for each possible argument. These include:
UdfCfg<Init>
UdfCfg<Process>
ArgList<Init>
ArgList<Process>
We probably want to have these implemented as something like
pub struct MockUdfCfg {
// add fields here
}
impl MockUdfCfg {
pub fn as_init(&mut self) -> &UdfCfg<Init>;
pub fn as_process(&mut self) -> &UdfCfg<Process>;
}
pub struct MockArgList{}
We'll do this for rev 0.5
Instead of having the user make something like
struct MyUdf {
ret: String
}
Let's just use something like this:
struct OwnedBuffer<U, R> {
data: U,
ret: R
}
That we store to the Box
. This will mitigate #39 and also mean we don't need to use GATs anymore, greatly simplifying the function signatures.
I have come to the conclusion that a user must put #[register]
on both impl blocks
The register on BasicUdf will:
impl UdfRegistered for StructName
And for the register on AggUdf:
Going to worry about things like renaming the function later
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.