Comments (4)
Hey @Leonti, thanks for reporting this. Right now, SQLGlot generates mostly JSONPath functions when parsing semi-structured accesses (e.g. GET_PATH
for VARIANT in Snowflake) instead of preserving the colon/dot notation.
However, I think the Snowflake -> Trino
transpilation is more nuanced and will probably require type inference to work properly, so the fix will be a best-effort attempt without a schema.
from sqlglot.
Quick follow up to gain more context, could you provide an example of how your data is structured in Snowflake and in what way would they be transformed to Trino's ROW
type instead of JSON
?
from sqlglot.
Hi @VaggelisD!
Thanks for looking into it so quickly!
I have an Iceberg table in S3 which is integrated with Snowflake.
The table has a couple of columns which are of type STRUCT
in Iceberg, which translates into VARIANT
when integrated with Snowflake. If I need to get access to nested fields using Snowflake I would do something like:
select payload:nestedField:nestedNestedField from iceberg_table;
because it's a VARIANT
type.
Because fields are nested within a a STRUCT
I've created a "convenience view" in snowflake that flattens the structure, something like this:
select
payload:nestedField:nestedNestedField as field,
payload:anotherField as field2,
-- and so on, potentially tens of fields
from iceberg_table;
Since it's an Iceberg table it can also be queried directly from Athena/Trino. In Trino Iceberg/Parquet STRUCT
is represented as ROW
type, and instead of using :
one would use .
notation to access nested fields.
Since I already have a bunch of "convenience views" for Snowflake, I was hoping to use sqlglot
to convert them to query the Iceberg table using Athena/Trino instead of converting them by hand.
When using Iceberg or Parquet on S3 the STRUCT
type gets converted into Snowflake's VARIANT
or Trino's ROW
directly, so JSON is not involved there at all.
from sqlglot.
Thanks for the detailed explanation! Did some research over the last days and came to the conclusion that there's ambiguity in transpiling Snowflake's VARIANT
as it can store practically any data type. For example, it can store semi-structured objects such as JSON
(in which case, the transpilation to Trino's JSON_EXTRACT
is correct) or a structured OBJECT
in the case of Iceberg's STRUCT
, which should be transpiled as you've explained.
Since SQLGlot cannot analyze the data to infer what's stored under VARIANT
, I'll attempt to make the transpilation configurable such that you can generate both versions (JSON_EXTRACT
and dot notation, depending on the setting) on Snowflake -> Presto/Trino, hopefully that will help.
from sqlglot.
Related Issues (20)
- DuckDB: Optimizer recommends `EXPLODE` function but it does not exist HOT 2
- [Snowflake]: Changed place of 'NOT' in 'WHERE' statement HOT 2
- Support Athena UNLOAD...TO HOT 4
- MOD function is replaced by % in ORACLE. It shouldn't be. HOT 1
- Parsing APPLY doubles execution time per APPLY statement HOT 2
- DATE_PART(ISO_WEEK, {date}) not transpiling correctly from TSQL to Databricks (Spark version >= 3.0). HOT 1
- generate_series() is not transpiled when going from postgres to mysql HOT 1
- [hive]Multi-insert will error when parse_one HOT 1
- Overlapping disjunctions are not optimized HOT 5
- NULL LAST DESC gives error - invalid expression in parse_one using read = postgres HOT 2
- Annotate types not working with subqueries/CTEs HOT 2
- Null order in union seems to affect data type inference HOT 1
- Is Splunk SPL a dialcect that might be added soon? HOT 1
- add benchmark for tree-sitter-sql HOT 1
- Sqlglot optimize is not working. HOT 1
- Simplify the lineage output in case of cte HOT 3
- Add `MINUS` keyword support for `Spark` and related dialects
- ORACLE : Column name in group by replaced by the column number when an alias with the same name as the column is used HOT 2
- in ORACLE queries, SYSDATE is replaced by CURRENT_TIMESTAMP. These 2 functions are not the same.
- Failure to parse complex UNPIVOT statement with Oracle Query
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sqlglot.