linkedin / coral Goto Github PK
View Code? Open in Web Editor NEWCoral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
License: BSD 2-Clause "Simplified" License
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
License: BSD 2-Clause "Simplified" License
The objective of Coral-Spark => View logical plan to spark sql
Unfortunately the entrypoint to coral is RelNode which is hard to construct from a view logical plan.
I've looked into the test cases and it seems we can not live without hive.
HiveToRelConverter
needs a full-fledged hive system. But to build a RelNode we only need spark's catalog (which can be anything, HMS/SparkCatalog or any 3rd party catalog).
But I've requirement of converting pure (LogicalPlan -> String) to re-generate sql.
Any ideas on that?
We had a few offline conversations about the n-to-one-to-n proposal, which is to support any engine query to any other engine query.
This issue is the feature quest to introduce Presto SQL as the input query and translate it into intermediate Coral representation and then it can be translated to other engines that are already supported in Coral output.
There are a few stages for this feature:
The gson dependency version 2.8.1 is impacted by a CVE GHSA-4jrv-ppp4-jm57
Hello,
Sorry for leaving message here, but I couldn't find place to ask about good usage(or example) of this project.
Is there some kind of opened discussion channel(gitter, discord...)?
There are Slack link, but seems this is not opened.
Thanks.
Currently, Coral supports unnesting an operand into a single row type.
Ex:
For a table default.struct
defined as - default.struct(a int, b array<struct<b1:string, b2:int>>
input query: SELECT * from default.struct cross join unnest(struct.b) AS t(ccol)
has a CoralIR representation.
However, OSS trino supports expanding the unnest operand into multiple columns. But coral doesn't accept the following as a valid trino input SQL: SELECT * from default.struct cross join unnest(struct.b) AS t(ccol1, ccol2)
A potential solution can be to introduce 2 types of uncollect relNodes in CoralIR - (1) Uncollect1 and (2) UncollectN to determine the number of columns unnest()
should expand to (1 or more).
hive-->presto translation for a view containing CASE expressions of the following form fails:
SELECT CASE n_regionkey WHEN 0 THEN 'Africa' WHEN 1 THEN 'America' END region_name FROM nation;
More details and stacktrace in trinodb/trino#5837
However, the case expressions of the form CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
are getting translated correctly.
I am having a problem with hive view translation to Trino that results in a query with an ambiguous column.
I narrowed down the issue to a simple view that reproduces the issue. I also identified why this happens which is a combination of how the select is rewritten and the column aliases used.
Conditions:
Given these 2 tables:
create table hive.test.table_a (
some_id string
);
create table hive.test.table_b (
some_id string
);
These 2 views will have the same issue:
-- Hive view version 1
create or replace view test.view_ab as
select a.some_id
from test.table_a a
left join
(
select trim(some_id) AS SOME_ID
from test.table_b
) b
on a.some_id = b.some_id
where a.some_id != ''
-- Hive view version 2
create or replace view test.view_ab2 as
select a.some_id
from test.table_a a
left join
(
select some_id AS SOME_ID
from test.table_b
) b
on a.some_id = trim(b.some_id)
where a.some_id != ''
The translated view hive in Trino will look like this:
CREATE VIEW hive.test.view_ab SECURITY DEFINER AS
SELECT some_id
FROM
(
SELECT
table_a.some_id some_id
, t.SOME_ID SOME_ID
FROM
(test.table_a
LEFT JOIN (
SELECT
TRIM(some_id) SOME_ID
, CAST(TRIM(some_id) AS VARCHAR(65536)) $f1
FROM
test.table_b
) t ON (table_a.some_id = t.$f1))
) t0
WHERE (t0.some_id <> '')
As you can see there are 2 columns with the same alias in the first sub-query
table_a.some_id some_id
, t.SOME_ID SOME_ID
The problem goes away if the alias is specified in lower case in the hive view definition. Which would seem like an easy change unless you have 50k+ views owned by many different teams across the organization.
I managed to fix the issue in my local environment by setting the node text to lower case for identifiers in the com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder
class but I am not sure if this is the best place to do this or if this is the best way to resolve the problem. Would that be an acceptable change?
protected SqlNode visitIdentifier(ASTNode node, ParseContext ctx) {
return new SqlIdentifier(node.getText().toLowerCase(Locale.ROOT), ZERO);
}
The table structure is as follows:
public static final TestTable TABLE_NULLSCOLLATION = new TestTable("nullscollation", ImmutableMap.of("name", SqlTypeName.VARCHAR,
"id", SqlTypeName.BIGINT, "fans", SqlTypeName.BIGINT, "log_date", SqlTypeName.VARCHAR));
The table data is as follows:
|NULL|456789|NULL|20210426|
|24784|556789|14535|20210426|
|NULL|33456789|NULL|20210426|
|96344|454888|14255|20210426|
|96344|454888|14255|20210426|
|NULL|7556789|NULL|20210426|
Raw sql :
SELECT name, id, fans FROM nullscollation
WHERE log_date = 20210426
ORDER BY fans DESC
LIMIT 3
Converted sql :
SELECT "name", "id", "fans"
FROM "nullscollation"
WHERE TRY_CAST("log_date" AS INTEGER) = 20210426
ORDER BY "fans" DESC NULLS FIRST
LIMIT 3
Original sql result use hive or presto:
|24784|556789|14535|
|96344|454888|14255|
|96344|454888|14255|
Converted sql result use presto:
|NULL|456789|NULL|
|NULL|33456789|NULL|
|NULL|7556789|NULL|
Hive default Nulls last for DESC, nulls first for ASC. Calcite defaultNullCollation is HIGH(which is Nulls first for DESC, nulls last for ASC.), coral not set nullcollation. We can set nullcollation to NullCollation.LOW when get hive sql validator,it can make the query results the same
sql :
select
countdate,-----统计日期
fstrmerchantid_name,
SUBSTRING(fstrmerchantid_name,1,instr(fstrmerchantid_name,'-')-1) as fstrmerchantid,
fuifundprojectid_name,
SUBSTRING(fuifundprojectid_name,1,instr(fuifundprojectid_name,'-')-1) as fuifundprojectid,
l0_fuiconfigquota,
l0_fuifundprojectidnames,
l0_fuiquotatypedesc,
l0_fstrfailunconfirmamounttypedesc,
l0_fuiwaitlendamount,
l0_fuilendprocessingamount,
l0_fuilendedamount,
l0_fuiremaintolendamount,
l1_fuiconfigquota,
l1_fstrtotalamounttypedesc,
l1_fstrfailunconfirmamounttypedesc,
l1_fuiwaitlendamount,
l1_fuilendprocessingamount,
l1_fuilendedamount,
l1_fuiremaintolendamount
from xy_mart.fact_loan_fund_lend_quota
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
code :
RelNode relNode = trinoToRelConverter.convertSql(a);
RelToTrinoConverter relToTrinoConverter = new RelToTrinoConverter();
String expandedSql = relToTrinoConverter.convert(relNode);
Hive supports CONCAT(date, varchar)
while Trino only supports concat(char(x), char(y)) , concat(array(E), E) E, concat(E, array(E)) E, concat(array(E)) E, concat(varchar) , concat(varbinary)
We need to handle this corner case (and maybe other similar corner cases) for Coral-Trino.
Currently, if we union the following 2 fields:
type: int, default: 1
and
type: [null, int], default: null
the returned field is:
type: [int, null], default: 1
this behavior may need to be adjusted to:
You can use the following test case to reproduce the issue.
public static final TestTable TABLE_FIVE = new TestTable("dwd_ogv_app_play_click_tf_inc_d", ImmutableMap.of("r_type", SqlTypeName.INTEGER,
"ogv_type_name", SqlTypeName.VARCHAR, "buvid", SqlTypeName.VARCHAR, "log_date", SqlTypeName.VARCHAR));
@Test
public void testSelect() {
String sql = String.format("select t.ogv_type_name, count(1) as user_amt from (SELECT ogv_type_name,\n" +
"buvid\n" +
"FROM dwd_ogv_app_play_click_tf_inc_d\n" +
"WHERE log_date = '20210312'\n" +
"AND r_type = 1\n" +
"AND ogv_type_name IS NOT NULL\n" +
"GROUP BY ogv_type_name, buvid) t GROUP BY\n" +
"t.ogv_type_name", tableFive);
String expectedSql = "";
testConversion(sql, expectedSql);
}
After convert, the presto query is below:
SELECT "ogv_type_name" AS "OGV_TYPE_NAME", COUNT(*) AS "USER_AMT"
FROM "dwd_ogv_app_play_click_tf_inc_d"
WHERE "log_date" = '20210312' AND "r_type" = 1 AND "ogv_type_name" IS NOT NULL
GROUP BY "ogv_type_name"
The semantics of the two Queries are clearly inconsistent.
I just clone coral and run ./gradlew build in the root path,then import IDEA,but IDEA can‘t find source class code like "coral.shade...." , IDEA version is 2020.01 . I found some issue (https://youtrack.jetbrains.com/issue/IDEA-163411), but not reslove,So what can I do?
Coral hardcodes "hive" as the default catalog name while translating views to presto.
i.e. a view hql SELECT a FROM test_schema.myview
translates to SELECT a FROM hive.test_schema.myview
in Presto.
This is problematic for users who don't have "hive" as their default catalog name.
Presto Users reported this regression since coral was integrated for more complex hive view support: trinodb/trino#5785
We need to modify the translation such that the underlying tables are resolved to the correct catalog name.
Some valid input trino SQLs like:
select * from unnest(array[1, 2, 3])
select x from unnest(array[1, 2, 3]) t(x)
select * from unnest(array[1, 2, 3]) with ordinality
SELECT * from default.struct cross join unnest(struct.b) AS t(b1col, b2col)
where table default.struct is defined as - default.struct(a int, b array<struct<b1:string, b2:int>>
cannot be converter to Coral IR representation.
For RelNode -> SqlNode translations for SQLs with LATERAL VIEW UNNEST(..), Coral only appends one (inner) AS operator. The output SqlNode generated looks like -
SELECT t0.ccol .... LATERAL VIEW UNNEST(col) AS t0 (ccol)...
However, the same underlying code in Calcite appends 2 AS operators in the following fashion -
SELECT t00.ccol .... LATERAL VIEW UNNEST(col) AS t0 (ccol) AS t00...
.
Why does Coral deviate and is it possible to follow the original Calcite syntax in Coral?
public void testTable() {
String sql = "select b, '中文' from test.t2 where b = a";
RelNode relNode = TestUtils.convertTable(sql);
RelToTrinoConverter relToTrinoConverter = new RelToTrinoConverter();
SqlNode expandedSql = relToTrinoConverter.convertToSqlNode(relNode);
System.out.println(expandedSql);
}
Many of our Hive view queries are failing to execute in Presto based on lack of support for the function greatest(<NUMERIC>, <NUMERIC>). Would it be possible to add support for this? We're currently using prestosql 346 which uses Coral 1.0.12.
I just downloaded https://mvnrepository.com/artifact/com.linkedin.coral/coral-trino-parser/2.0.53 and inspected its content and there are no shaded sources of trino-parser
library in it.
I've also seen #222 that the trino-parser is a compile only dependency after the initial attempt to shade the trino-parser
: #220
Is there any valid reason of publishing coral-trino-parser
if it is empty?
Related issue: #205
Currently Coral lacks support of the LATERAL VIEW json_tuple
which is a Hive syntax leveraging json_tuple
UDTF.
The main difference between json_tuple
and other UDTFs like explode
is that json_tuple
is column generating function (in contrast to explode
which is a row generating function).
json_tuple
has following signature in Hive:
json_tuple(string jsonStr,string k1,...,string kn)
Where jsonStr
is a string or column reference (of string type) holding serialized JSON object. k1, ..., kn
are field names that are being extracted from jsonStr
JSON object. So having jsonStr
:
{"user_id": 1000, "name": "Mateusz", "surname": "Gajewski"}
SELECT json_tuple(jsonStr, "name", "surname", "user_id")
will return a tuple (this Hive concept is weird tbh):
"1000", "Mateusz", "Gajewski"
and by default, those columns are having automatically generated names (c0
, c1
,...).
Hive allows to name those columns which Coral doesn't handle as well:
SELECT json_tuple(jsonStr, "name", "surname", "user_id") as (name, surname, user_id)
And finally:
Hive's:
SELECT v.col, v.col2 FROM json_test t
LATERAL VIEW json_tuple(t.col, 'field', 'field2') v as col, col2
should be translated to Trino's/Presto's query (or similar shape):
SELECT v.field AS col, v.field2 AS col2 from json_test t
CROSS JOIN LATERAL (
SELECT cast(json_parse(t.col) as ROW(field varchar, field2 varchar)) as v
);
Please note that due to Hive's AS
clause it's possible to extract field
from JSON as column col
.
Coral transform the following hive sql to trino
select regexp_extract('1a 2b 14m', '\d+', 1);
result:
select regexp_extract('1a 2b 14m', "hive_pattern_to_presto"('\d+'), 1);
But trino not have the function name of "hive_pattern_to_presto", then the query will failed with exception:
Function 'hive_pattern_to_trino' not registered
So, why you introduce the "hive_pattern_to_trino" function? Do this function is implement in linkedin internal?
For a given Hive SQL with Lateral View, the RelNode representation generated has LogicalCorrelate / LogicalJoin depending on input SQL's syntax.
For Input hive SQL - SELECT col FROM (SELECT ARRAY('a1', 'a2') as a) LATERAL VIEW EXPLODE(a) a_alias AS col
RelNode representation is -
LogicalProject(col=[$1])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalProject(a=[ARRAY('a1', 'a2')])
LogicalValues(tuples=[[{ 0 }]])
HiveUncollect
LogicalProject(col=[$cor0.a])
LogicalValues(tuples=[[{ 0 }]])
For Input SQL - SELECT arr.alias FROM foo LATERAL VIEW EXPLODE(ARRAY('a', 'b')) arr as alias
RelNode representation -
LogicalProject(alias=[$3])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[hive, default, foo]])
HiveUncollect
LogicalProject(col=[ARRAY('a', 'b')])
LogicalValues(tuples=[[{ 0 }]])
Both translations should result only in LogicalCorrelate in the RelNode representation.
To reproduce this issue:
add this test in ViewToAvroSchemaConverterTests
and run it:
@Test
public void foo() {
String viewSql = "CREATE VIEW v AS SELECT id as id FROM basecomplex";
TestUtils.executeCreateViewQuery("default", "v", viewSql);
ViewToAvroSchemaConverter viewToAvroSchemaConverter = ViewToAvroSchemaConverter.create(hiveMetastoreClient);
Schema actualSchema = viewToAvroSchemaConverter.toAvroSchema("default", "v", false, false);
System.out.println(actualSchema);
}
the output will be:
{"type":"record","name":"v","namespace":"default.v","fields":[{"name":"Id","type":"int"}]}
Which shows the column name is Id
rather than the explicit lower-cased alias: id
.
At present, in ParseTreeBuilder.visitLateralViewExplode, we assume LATERAL VIEW is used with UNNEST EXPLODE only. It should be made more generic to support other UDTFs.
When bumping the Coral dependency version in Trino, we noticed that coral-trino
has a new dependency on trino-parser
.
One of our maintainers brought up the fact that the trino-parser
is not designed to be depended on by other projects i.e. not guaranteed to maintain backward compatibility. In order to preempt future incompatibilities, would the folks in Coral consider shading the trino-parser
dependency?
For reference, the conversation is here.
I am having a problem with sql translation to Trino that results in a query with an identifier column.
SELECT numbers , animal , n , a
FROM (
VALUES
( ARRAY [ 2 , 5 ], ARRAY [ 'dog' , 'cat' , 'bird' ]),
( ARRAY [ 7 , 8 , 9 ], ARRAY [ 'cow' , 'pig' ])
) AS x ( numbers , animals )
CROSS JOIN UNNEST ( numbers, 动物) AS t ( n , a );
While adding support for window functions, I noticed an issue in the current design of ParseTreeBuilder
that has caused the following query:
SELECT `a`, RANK() OVER (PARTITION BY `b` ORDER BY `c`) FROM `foo`
To be translated into a Calcite SqlNode tree as the following:
SELECT `a`, RANK() OVER (PARTITION BY `b` ORDER BY `c`) FROM `foo` ORDER BY `c`
The extra "ORDER BY c
" at the end of the query was due to the following issue in the current design of ParseTreeBuilder
.
com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder
has the following method that uses context to pass around information while converting the Hive ASTNode
tree to Calcite SqlNode
tree:
private SqlNode processAST(ASTNode node, @Nullable Table hiveView) {
ParseContext ctx = new ParseContext(hiveView);
return visit(node, ctx);
}
In the following function, it picks up "order by" clause by looking it up via the context:
@Override
protected SqlNode visitQueryNode(ASTNode node, ParseContext ctx) {
ArrayList<Node> children = node.getChildren();
checkState(children != null && !children.isEmpty());
SqlNode cte = null;
ParseContext qc = new ParseContext(ctx.getHiveTable().orElse(null));
for (Node child : node.getChildren()) {
ASTNode ast = (ASTNode) child;
if (ast.getType() == HiveParser.TOK_CTE) {
// Child of type TOK_CTE represents the "WITH" list
/** See {@link #visitCTE(ASTNode, ParseContext) visitCTE} for the return value */
cte = visit(ast, new ParseContext(null));
} else {
// The return values are ignored since all other children of SELECT query will be captures via ParseConext qc.
visit(ast, qc);
}
}
SqlSelect select = new SqlSelect(ZERO, qc.keywords, qc.selects, qc.from, qc.where, qc.grpBy, qc.having, null,
qc.orderBy, null, qc.fetch);
if (cte != null) {
// Calcite uses "SqlWith(SqlNodeList of SqlWithItem, SqlSelect)" to represent queries with WITH
/** See {@link #visitCTE(ASTNode, ParseContext) visitCTE} for details */
return new SqlWith(ZERO, (SqlNodeList) cte, select);
} else {
return select;
}
}
So any kind of "ORDER BY" within the SELECT clause will be picked up.
To fix the issue, I would like to get rid of the use of Context but instead directly pass the orderBy SqlNode via the return value of the visit function.
Any thoughts @wmoustafa @funcheetah @antumbde ?
If we create a Hive view with columns that are not explicitly named in the SQL, Hive with autogenerate column names for such columns persisting the view to the Metastore. E.g. Consider the following view
hive> CREATE VIEW v AS SELECT TRUE, lower('STR') FROM some_table;
Here is the persisted information in the metastore
hive> DESC FORMATTED v;
OK
# col_name data_type comment
_c0 boolean
_c1 string
.
.
.
# View Information
View Original Text: SELECT TRUE, lower('STR') FROM some_table
View Expanded Text: SELECT TRUE, lower('STR') FROM `some_table`
Trying to retrieve view schema using coral-schema
for this view returns EXPR_0
and EXPR_1
as column names, which I assume are auto generated by Calcite. I think we should respect the column names provided by Hive here i.e. _c0
and c1
since they have been persisted as column metadata in the metastore.
Can you address a couple of instances of language in the project where the wording can be made more inclusive? (example: whiltelist -> allowlist). We are trying to make the code in all LinkedIn projects more inclusive. Could you please examine whether or not these need to be updated, and make the changes? For suggested replacements see go/inclusivelanguage or google. THANK YOU!
Term | URL |
---|---|
Blacklist | https://github.com/linkedin/coral/blob/b436653fe94c1b6bab30fcf7c9fd1869e2461afd/coral-spark/src/main/java/com/linkedin/coral/spark/UnsupportedHiveUDFsInSpark.java |
Can Coral provide stable APIs to translate expressions from hive to Presto through coral?
Usecase (As requested by Praveen Krishna):
Presto allows connectors to provide row-filters in terms of expressions. It's possible that they're defined in hiveql, and presto needs to read and transform them before applying. For this purpose, having the ability to translate expressions directly is useful.
for example, Presto can be configured to say that for queries by user U
on on table tpch.tiny.orders
, exclude rows for which orderKey is >= 10
. This is defined as an expression written as orderKey < 10
in the configuration. Currently, Presto requires this expression to be in a form that Presto can understand. If coral allows hive --> presto translation of such expressions, then they could be expressed as hive expressions and translated at runtime.
A repeated error came up during testing of TrinoToRelConverter, coral.shading.io.trino.sql.parser.ParsingException: line 1:8: backquoted identifiers are not supported; use double quotes to quote identifiers
. The issue is that during the Trino to Rel conversion, if we expand the view, backquotes are added and the query is then rejected by Trino SqlParser.
Below SQL throws an exception
SELECT date('2021-01-02') as a
com.linkedin.coral.hive.hive2rel.functions.UnknownSqlFunctionException: Unknown function name: date
at com.linkedin.coral.hive.hive2rel.functions.HiveFunctionResolver.tryResolve(HiveFunctionResolver.java:124)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunctionInternal(ParseTreeBuilder.java:572)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunction(ParseTreeBuilder.java:563)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunction(ParseTreeBuilder.java:81)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:122)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelectExpr(ParseTreeBuilder.java:580)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelectExpr(ParseTreeBuilder.java:81)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:98)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelect(ParseTreeBuilder.java:605)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelect(ParseTreeBuilder.java:81)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:95)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitInsert(AbstractASTVisitor.java:522)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:86)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitQueryNode(ParseTreeBuilder.java:733)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitQueryNode(ParseTreeBuilder.java:81)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:66)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitNil(ParseTreeBuilder.java:739)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitNil(ParseTreeBuilder.java:81)
at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:47)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.processAST(ParseTreeBuilder.java:176)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.process(ParseTreeBuilder.java:168)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.processSql(ParseTreeBuilder.java:161)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilderTest.convert(ParseTreeBuilderTest.java:145)
at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilderTest.testConvertAndValidate(ParseTreeBuilderTest.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:758)
at org.testng.TestRunner.run(TestRunner.java:613)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
at org.testng.SuiteRunner.run(SuiteRunner.java:240)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:87)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1170)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1095)
at org.testng.TestNG.run(TestNG.java:1007)
at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)
Relates to trinodb/trino#8789
Can Presto SQL be converted to Hive SQL?
We have a SQL that uses || for string splicing, just like select'a' ||'b' ,but coral does not support this syntax.Can someone help support this feature?
Coral version: 2.0.86
SQL: select '2022-01-01' rlike '^\\d{4}-\\d{2}-\\d{2}$|^\\d{4}-\\d{2}$'
, return true in hive
However coral's translation result is SELECT "REGEXP_LIKE"('2022-01-01', '^\\d{4}-\\d{2}-\\d{2}$|^\\d{4}-\\d{2}$') FROM (VALUES (0)) AS "t" ("ZERO")
, return false in trino, but we know the expected translation result is SELECT "REGEXP_LIKE"('2022-01-01', '^\d{4}-\d{2}-\d{2}$|^\d{4}-\d{2}$') FROM (VALUES (0)) AS "t" ("ZERO")
(the double backslash should be translate to single backslash).
Is i using it wrong or there is a BUG ?
After adding to trino2rel, there are a few classes that both coral-hive and coral-trino use that can now be put in coral-common instead such as DaliOperatorTable
.
These changes will be also helpful when adding future support for other SqlLanguage2rel conversions.
Not supported hive sql which contains windowfunc,
exception:
Unhandled Hive AST token TOK_WINDOWSPEC, tree:
`
TOK_WINDOWSPEC
TOK_PARTITIONINGSPEC
TOK_DISTRIBUTEBY
TOK_TABLE_OR_COL
c1
TOK_ORDERBY
TOK_TABSORTCOLNAMEDESC
TOK_TABLE_OR_COL
c2
`
for example:
column | type
c0 | string
c1 | string
c2 | int
select c0, c1, c2, lead(c0, 1) over ( PARTITION BY c1 ORDER BY c2 des ) c3 FROM hive.tablex
SELECT c0, c1, ROW_NUMBER() over ( PARTITION BY c0, ORDER BY c1 desc ) rw FROM hive.tablex
Currently dynamic UDF resolution is only testable through integration tests as it relies on communicating with an Artifact Management Repository, typically an external service. At this point, integration tests are not open source either. This issue is to track creating a unit test that tests this functionality by creating a local artifact management repository and leveraging it in unit tests.
The Hive view is defined as
CREATE VIEW v1 AS SELECT NULL AS c1 FROM t
When using it in Trino it throws an error
Unknown type 'NULL' for column 'c1' in view: hive.default.v1
It looks like Hive has the type void
for NULL, while Trino doesn't and still enforces a type for every column.
A workaround is to modify the Hive view definition by adding an explicit cast (to whatever valid type in both Hive and Trino)
CREATE VIEW v1 AS SELECT CAST(NULL AS INT) AS c1 FROM t
This will be accepted by Trino and still works in Hive.
Regarding the translation from Hive to Trino, does it make sense to convert a plain NULL column into a CAST?
OperatorTransformer is a re-implementation of UDFTransformer functionality, but at the SQLNode level.
Currently during hive -> spark/trino Sql translations, coral creates a relNode plan. The tableScan relNode has a representation: LogicalTableScan(table=[[hive, dbName, tableName]]).
When this relNode is translated to a SqlNode, "hive" (catalog name) has to be removed explicitly.
Filing this issue to investigate if introducing the catalog name in the relNode can be avoided.
Translation of struct<> field references to Presto fails with 1.0.25 (released 4 days ago).
Sample:
public static void main(String[] args) throws Exception {
Map<String, Map<String, List<String>>> localMetaStore = new HashMap<>();
Map<String, List<String>> tables = new HashMap<>();
tables.put("bar", Arrays.asList("col|struct<i:int,s:string>"));
localMetaStore.put("foo", tables);
HiveToRelConverter converter = HiveToRelConverter.create(localMetaStore);
RelToPrestoConverter presto = new RelToPrestoConverter();
RelNode rel = converter.convertSql("select col from foo.bar");
System.out.println(presto.convert(rel));
// Referencing 'i' in struct 'col' fails
rel = converter.convertSql("select col.i from foo.bar");
System.out.println(presto.convert(rel));
}
Stacktrace:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
SELECT "col"
FROM "foo"."bar"
Exception in thread "main" org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: Table 'col' not found
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:834)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:819)
at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:4867)
at org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:370)
at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5759)
at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5744)
at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
at org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:5351)
at com.linkedin.coral.hive.hive2rel.HiveSqlValidator.expand(HiveSqlValidator.java:61)
at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:447)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelectList(SqlValidatorImpl.java:4104)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3392)
at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1005)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:965)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:940)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:647)
at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:57)
at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.toRel(HiveToRelConverter.java:126)
at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.convertSql(HiveToRelConverter.java:81)
at com.vmware.cb.product.metrics.Main.main(Main.java:51)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Table 'col' not found
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:572)
... 23 more
Currently Coral is using Hive 1.0, would there be any plans to support other Hive versions?
This is causing issues with systems using Hive 2.1.0 and above, for example: several date functions such as date_add had their function signatures changed in 2.1.0 release. Reference: https://cwiki.apache.org/confluence/display/hive/languagemanual+udf
Currently Coral only supports translations for select kinds of queries. If we want to translate CTAS queries, it throws an error during AST node to SqlNode conversion.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.