Comments (6)
I think it would actually be easier. The easiest was the old way where we have only Expr::GetIndexedField
and get_field
doesn't exist (no functions). However, that ship has already sailed (the functions are valuable because they allow extensibility).
Today, I have to account for both paths. We use the SQL parser as our input. I don't know (off the top of my head) if it uses Expr
or get_field
and it doesn't really matter because, if I don't account for both, it will inevitably change which one it uses (murphy's law).
So having one path, even if it is a slightly less obvious path, is better than having two paths.
from datafusion.
@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step. I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.
I agree that changing the parser to insert a call to get field access directly is a good idea (and would be consistent and allow us to remove Expr::GetFieldAccess
Have you thought about "user-defined" parser idea before, the way that user can define their own expression to get from the syntax? Is it useful in production? 🤔
One thing I have thought about is changing the hard coded lookup of function names from a pattern like this
datafusion/datafusion/sql/src/expr/value.rs
Lines 144 to 150 in fc34dac
To be something more structured
pub trait PlannerFunctions {
/// return the UDF to use for creating arrays ("make_array") by default:
fn make_array(&self) -> Result<ScalarUDF>;
...
// similar functions for other built in functions
}
And then instead of
if let Some(udf) = self.context_provider.get_function_meta("make_array") {
Ok(Expr::ScalarFunction(ScalarFunction::new_udf(udf, values)))
} else {
not_impl_err!(
"array_expression featrue is disable, So should implement make_array UDF by yourself"
)
}
The planner might look like
let udf = self.planner_functions.make_array()?;
Ok(Expr::ScalarFunction(ScalarFunction::new_udf(udf, values)))
But I haven't had a usecase to do that myself yet
from datafusion.
One potential downside as @westonpace mentioned somewhere I can't find now, is that systems that want to look for field accesses for their own analysis (e.g. to find all nested field references) would find it harder to do so
from datafusion.
@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step.
I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.
Have you thought about "user-defined" parser idea before, the way that user can define their own expression to get from the syntax? Is it useful in production? 🤔
from datafusion.
@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step. I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.
The array operator to function is syntax like array1 || array2
-> array_concat
, which is in ArrayFunctionRewriter now, so I'm thinking about whether we should move this to the parser or not.
from datafusion.
@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step. I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.
The array operator to function is syntax like
array1 || array2
->array_concat
, which is in ArrayFunctionRewriter now, so I'm thinking about whether we should move this to the parser or not.
Let's move the discussion to #10534
from datafusion.
Related Issues (20)
- Convert `Regr` to UDAF HOT 1
- Convert `Correlation` to UDAF HOT 1
- SMJ producing different results than HashJoin when doing a semi join HOT 7
- Construction of user-defined table functions (UDTFs) should be async to allow for async schemas HOT 1
- Real-time streaming support HOT 1
- Data set which is much bigger than RAM HOT 1
- ci: clippy failed on main
- Convert `Grouping` to UDAF HOT 2
- Convert `BitAnd`, `BitOr`, `BitXor` to UDAF HOT 2
- CTE in a UNION query can escape its scope HOT 1
- Unclear error message when calling a function with no parameters.
- [Epic] Implement support for `StringView` in DataFusion HOT 1
- Implement equality `=` and inequality `<>` support for `StringView` HOT 4
- Implement `arrow_cast` support for `StringView` and `BinaryView` HOT 1
- use StringViewArray when reading String columns from Parquet HOT 2
- [EPIC] Continued correct and improved extracting Parquet statistics into ArrayRefs
- Update ListingTable to use `StatisticsConverter`
- `StatisticsConverter::row_group_null_counts` incorrect for missing column
- Support extracting `Int8`, `Int16`, `Int32` statistics from Parquet Data Pages HOT 1
- Do we need to escape search string as it's used in regexp? Wondering what's the result of `contains("abcdefg", ".*")` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datafusion.