linkedin / coral Goto Github PK

View Code? Open in Web Editor NEW

763.0 763.0 177.0 6.5 MB

Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.

License: BSD 2-Clause "Simplified" License

Java 91.20% GAP 6.96% JavaScript 1.53% Shell 0.04% Python 0.22% CSS 0.05%

coral's People

Contributors

Stargazers

Watchers

Forkers

jiaqiang nagarjunakanamarlapudi rdblue guruchai simhaonline anhntbk08 finch0001 phd3 dwtcourses kxu1026 andrewkantor lingya pennyandwang kbohra hotsushi khaitranq ebyhr wmoustafa rzhang10 vishalishere xiashuijun rymurr aaaaaaron flydreamwangping cwsteinbach martint leiwenchang autumnust mockitoguy pzhao16me cyofeiyue losipiuk thuannd97 antumbde suvrajit94 xiejiajun zhbdesign hnail gjhkael wang-shun wendigo shardulm94 findepi feng-tao long1208 shenodaguirguis gavinljj saj1th kyprifog huangxiaopingrd sitedata hanghangliu funcheetah mnikogit jiajunbernoulli zhaoyachao nagarathnam200 junwen12221 jongn varunbharill guiyanakuang dprophet uzshao tonystark04 ciusji weiatwork ywang1383 jacky539027 willie-valdez sherryhli wenruimeng himanshpal nobigo cy-athena bem3 296431555 jason003 hwlsniper kevinge00 maluchari taobupt crazyshark zuoyebushiwo findinpath ljfgem misselvexu fishcus ticketmaster wildwolfbang aastha25 alittleben jsonliuup zhifeiyu 770120041 zx-hub terry1504 maikouliujian singer-bin skdfeitian feloxx

coral's Issues

Please clarify documentation on how to use Coral-Spark with no hive

The objective of Coral-Spark => View logical plan to spark sql

Unfortunately the entrypoint to coral is RelNode which is hard to construct from a view logical plan.
I've looked into the test cases and it seems we can not live without hive.
HiveToRelConverter needs a full-fledged hive system. But to build a RelNode we only need spark's catalog (which can be anything, HMS/SparkCatalog or any 3rd party catalog).

But I've requirement of converting pure (LogicalPlan -> String) to re-generate sql.

Any ideas on that?

Support Presto as the input SQL

We had a few offline conversations about the n-to-one-to-n proposal, which is to support any engine query to any other engine query.

This issue is the feature quest to introduce Presto SQL as the input query and translate it into intermediate Coral representation and then it can be translated to other engines that are already supported in Coral output.

There are a few stages for this feature:

Parse Trino SQL to Calcite SqlNode
Calcite SqlNode to Trino RelNode (most work is about the function resolver)
Trino RelNode to Intermediate Coral RelNode
End to end test

Address gson CVE-2022-25647

The gson dependency version 2.8.1 is impacted by a CVE GHSA-4jrv-ppp4-jm57

Discussion channel

Hello,
Sorry for leaving message here, but I couldn't find place to ask about good usage(or example) of this project.

Is there some kind of opened discussion channel(gitter, discord...)?
There are Slack link, but seems this is not opened.

Thanks.

Trino SQL with `UNNEST(arrayOfStruct) AS t(col0, col1...)` cannot be converter to CoralIR

Currently, Coral supports unnesting an operand into a single row type.
Ex:
For a table default.struct defined as - default.struct(a int, b array<struct<b1:string, b2:int>>

input query: SELECT * from default.struct cross join unnest(struct.b) AS t(ccol) has a CoralIR representation.

However, OSS trino supports expanding the unnest operand into multiple columns. But coral doesn't accept the following as a valid trino input SQL: SELECT * from default.struct cross join unnest(struct.b) AS t(ccol1, ccol2)

A potential solution can be to introduce 2 types of uncollect relNodes in CoralIR - (1) Uncollect1 and (2) UncollectN to determine the number of columns unnest() should expand to (1 or more).

Issue translating queries with CASE expressions

hive-->presto translation for a view containing CASE expressions of the following form fails:

SELECT CASE n_regionkey WHEN 0 THEN 'Africa' WHEN 1 THEN 'America' END region_name FROM nation;

More details and stacktrace in trinodb/trino#5837

However, the case expressions of the form CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END are getting translated correctly.

cc @wmoustafa @funcheetah @HotSushi

Column '<table>.<column>' is ambiguous in Trino

I am having a problem with hive view translation to Trino that results in a query with an ambiguous column.

I narrowed down the issue to a simple view that reproduces the issue. I also identified why this happens which is a combination of how the select is rewritten and the column aliases used.

Conditions:

The select statement must have a join that uses a function or one of the columns in the join had a function applied to it.
There must be a where clause (otherwise the query is not rewritten in the same way).
Aliases must be used that match the original column name and are written in UPPER CASE.

Given these 2 tables:

create table hive.test.table_a (
  some_id string
);

create table hive.test.table_b (
  some_id string
);

These 2 views will have the same issue:

-- Hive view version 1
create or replace view test.view_ab as
select a.some_id
from   test.table_a a
left join 
(
  select trim(some_id) AS SOME_ID
  from   test.table_b
) b
on    a.some_id = b.some_id
where a.some_id != ''

-- Hive view version 2
create or replace view test.view_ab2 as
select a.some_id
from   test.table_a a
left join 
(
  select some_id AS SOME_ID
  from   test.table_b
) b
on    a.some_id = trim(b.some_id)
where a.some_id != ''

The translated view hive in Trino will look like this:

CREATE VIEW hive.test.view_ab SECURITY DEFINER AS
SELECT some_id
FROM
  (
   SELECT
     table_a.some_id some_id
   , t.SOME_ID SOME_ID
   FROM
     (test.table_a
   LEFT JOIN (
      SELECT
        TRIM(some_id) SOME_ID
      , CAST(TRIM(some_id) AS VARCHAR(65536)) $f1
      FROM
        test.table_b
   )  t ON (table_a.some_id = t.$f1))
)  t0
WHERE (t0.some_id <> '')

As you can see there are 2 columns with the same alias in the first sub-query

     table_a.some_id some_id
   , t.SOME_ID SOME_ID

The problem goes away if the alias is specified in lower case in the hive view definition. Which would seem like an easy change unless you have 50k+ views owned by many different teams across the organization.

I managed to fix the issue in my local environment by setting the node text to lower case for identifiers in the com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder class but I am not sure if this is the best place to do this or if this is the best way to resolve the problem. Would that be an acceptable change?

  protected SqlNode visitIdentifier(ASTNode node, ParseContext ctx) {
    return new SqlIdentifier(node.getText().toLowerCase(Locale.ROOT), ZERO);
  }

The order of the converted SQL execution results is inconsistent when use coral convert hive query to presto

The table structure is as follows：

public static final TestTable TABLE_NULLSCOLLATION = new TestTable("nullscollation", ImmutableMap.of("name", SqlTypeName.VARCHAR,
          "id", SqlTypeName.BIGINT, "fans", SqlTypeName.BIGINT, "log_date", SqlTypeName.VARCHAR));

The table data is as follows：

|NULL|456789|NULL|20210426|
|24784|556789|14535|20210426|
|NULL|33456789|NULL|20210426|
|96344|454888|14255|20210426|
|96344|454888|14255|20210426|
|NULL|7556789|NULL|20210426|

Raw sql :

SELECT name, id, fans FROM nullscollation
WHERE log_date = 20210426
ORDER BY fans DESC
LIMIT 3

Converted sql ：

SELECT "name", "id", "fans" 
FROM "nullscollation"
WHERE TRY_CAST("log_date" AS INTEGER) = 20210426
ORDER BY "fans" DESC NULLS FIRST
LIMIT 3

Original sql result use hive or presto：

|24784|556789|14535|
|96344|454888|14255|
|96344|454888|14255|

Converted sql result use presto:

|NULL|456789|NULL|
|NULL|33456789|NULL|
|NULL|7556789|NULL|

Hive default Nulls last for DESC, nulls first for ASC. Calcite defaultNullCollation is HIGH(which is Nulls first for DESC, nulls last for ASC.), coral not set nullcollation. We can set nullcollation to NullCollation.LOW when get hive sql validator，it can make the query results the same

when resolve sql not support group by 1,2,3 this stytle

sql :
select
countdate,-----统计日期
fstrmerchantid_name,
SUBSTRING(fstrmerchantid_name,1,instr(fstrmerchantid_name,'-')-1) as fstrmerchantid,
fuifundprojectid_name,
SUBSTRING(fuifundprojectid_name,1,instr(fuifundprojectid_name,'-')-1) as fuifundprojectid,
l0_fuiconfigquota,
l0_fuifundprojectidnames,
l0_fuiquotatypedesc,
l0_fstrfailunconfirmamounttypedesc,
l0_fuiwaitlendamount,
l0_fuilendprocessingamount,
l0_fuilendedamount,
l0_fuiremaintolendamount,
l1_fuiconfigquota,
l1_fstrtotalamounttypedesc,
l1_fstrfailunconfirmamounttypedesc,
l1_fuiwaitlendamount,
l1_fuilendprocessingamount,
l1_fuilendedamount,
l1_fuiremaintolendamount
from xy_mart.fact_loan_fund_lend_quota
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20

code ：
RelNode relNode = trinoToRelConverter.convertSql(a);
RelToTrinoConverter relToTrinoConverter = new RelToTrinoConverter();
String expandedSql = relToTrinoConverter.convert(relNode);

error

`CONCAT(date, varchar)` is not handled well in Coral-Trino

Hive supports CONCAT(date, varchar) while Trino only supports concat(char(x), char(y)) , concat(array(E), E) E, concat(E, array(E)) E, concat(array(E)) E, concat(varchar) , concat(varbinary)

We need to handle this corner case (and maybe other similar corner cases) for Coral-Trino.

Coral Schema union behavior for a nullable field and a non-nullable field with default value

Currently, if we union the following 2 fields:

type: int, default: 1

and

type: [null, int], default: null

the returned field is:

type: [int, null], default: 1

this behavior may need to be adjusted to:

fail the union
choose 1 schema directly (left?)

Why is the character string transformed into Unicode encoding

Why is the character string transformed into Unicode encoding
And the converted Unicode encoding starts with 'u&'

Semantic inconsistency use coral convert hive query to presto

You can use the following test case to reproduce the issue.

  public static final TestTable TABLE_FIVE = new TestTable("dwd_ogv_app_play_click_tf_inc_d", ImmutableMap.of("r_type", SqlTypeName.INTEGER,
      "ogv_type_name", SqlTypeName.VARCHAR, "buvid", SqlTypeName.VARCHAR, "log_date", SqlTypeName.VARCHAR));

  @Test
  public void testSelect() {
    String sql = String.format("select t.ogv_type_name, count(1) as user_amt from (SELECT ogv_type_name,\n" +
        "buvid\n" +
        "FROM dwd_ogv_app_play_click_tf_inc_d\n" +
        "WHERE log_date = '20210312'\n" +
        "AND r_type = 1\n" +
        "AND ogv_type_name IS NOT NULL\n" +
        "GROUP BY ogv_type_name, buvid) t GROUP BY\n" +
        "t.ogv_type_name", tableFive);
    String expectedSql = "";
    testConversion(sql, expectedSql);
  }

After convert, the presto query is below:

SELECT "ogv_type_name" AS "OGV_TYPE_NAME", COUNT(*) AS "USER_AMT"
FROM "dwd_ogv_app_play_click_tf_inc_d"
WHERE "log_date" = '20210312' AND "r_type" = 1 AND "ogv_type_name" IS NOT NULL
GROUP BY "ogv_type_name"

The semantics of the two Queries are clearly inconsistent.

IntelliJ IDEA not recognize coral-trino-parser libraries

I just clone coral and run ./gradlew build in the root path，then import IDEA，but IDEA can‘t find source class code like "coral.shade...." , IDEA version is 2020.01 . I found some issue (https://youtrack.jetbrains.com/issue/IDEA-163411), but not reslove,So what can I do?

Coral prepends hardcoded catalog name while translating a hive view to Presto

Coral hardcodes "hive" as the default catalog name while translating views to presto.

i.e. a view hql SELECT a FROM test_schema.myview translates to SELECT a FROM hive.test_schema.myview in Presto.

This is problematic for users who don't have "hive" as their default catalog name.

Presto Users reported this regression since coral was integrated for more complex hive view support: trinodb/trino#5785

We need to modify the translation such that the underlying tables are resolved to the correct catalog name.

Trino SQLs cannot be converted to CoralSqlNode & CoralRelNode representation

Some valid input trino SQLs like:

select * from unnest(array[1, 2, 3])
select x from unnest(array[1, 2, 3]) t(x)
select * from unnest(array[1, 2, 3]) with ordinality
SELECT * from default.struct cross join unnest(struct.b) AS t(b1col, b2col) where table default.struct is defined as - default.struct(a int, b array<struct<b1:string, b2:int>>

cannot be converter to Coral IR representation.

Hive -> Presto Can you support Grouping Sets ?

Append 2 AS operators in SqlNode representation for LATERAL VIEWS

For RelNode -> SqlNode translations for SQLs with LATERAL VIEW UNNEST(..), Coral only appends one (inner) AS operator. The output SqlNode generated looks like -
SELECT t0.ccol .... LATERAL VIEW UNNEST(col) AS t0 (ccol)...

However, the same underlying code in Calcite appends 2 AS operators in the following fashion -
SELECT t00.ccol .... LATERAL VIEW UNNEST(col) AS t0 (ccol) AS t00....

Why does Coral deviate and is it possible to follow the original Calcite syntax in Coral?

Unable to download jar package

When the project is loaded, the package calcite-core-1.21.0.151-shaded.jar cannot be downloaded consistently

org.apache.calcite.runtime.CalciteException: Failed to encode '中文' in character set 'ISO-8859-1'

public void testTable() {
String sql = "select b, '中文' from test.t2 where b = a";
RelNode relNode = TestUtils.convertTable(sql);
RelToTrinoConverter relToTrinoConverter = new RelToTrinoConverter();
SqlNode expandedSql = relToTrinoConverter.convertToSqlNode(relNode);
System.out.println(expandedSql);
}

Support translation of function greatest(<NUMERIC>, <NUMERIC>) in Presto

Many of our Hive view queries are failing to execute in Presto based on lack of support for the function greatest(<NUMERIC>, <NUMERIC>). Would it be possible to add support for this? We're currently using prestosql 346 which uses Coral 1.0.12.

coral-trino-parser library is empty

I just downloaded https://mvnrepository.com/artifact/com.linkedin.coral/coral-trino-parser/2.0.53 and inspected its content and there are no shaded sources of trino-parser library in it.

I've also seen #222 that the trino-parser is a compile only dependency after the initial attempt to shade the trino-parser : #220

Is there any valid reason of publishing coral-trino-parser if it is empty?

Related issue: #205

Support for LATERAL VIEW json_tuple

Currently Coral lacks support of the LATERAL VIEW json_tuple which is a Hive syntax leveraging json_tuple UDTF.

The main difference between json_tuple and other UDTFs like explode is that json_tuple is column generating function (in contrast to explode which is a row generating function).

json_tuple has following signature in Hive:

json_tuple(string jsonStr,string k1,...,string kn)

Where jsonStr is a string or column reference (of string type) holding serialized JSON object. k1, ..., kn are field names that are being extracted from jsonStr JSON object. So having jsonStr:

{"user_id": 1000, "name": "Mateusz", "surname": "Gajewski"}

SELECT json_tuple(jsonStr, "name", "surname", "user_id")

will return a tuple (this Hive concept is weird tbh):

"1000", "Mateusz", "Gajewski"

and by default, those columns are having automatically generated names (c0, c1,...).

Hive allows to name those columns which Coral doesn't handle as well:

SELECT json_tuple(jsonStr, "name", "surname", "user_id") as (name, surname, user_id)

And finally:

Hive's:

SELECT v.col, v.col2 FROM json_test t 
  LATERAL VIEW json_tuple(t.col, 'field', 'field2') v as col, col2

should be translated to Trino's/Presto's query (or similar shape):

SELECT v.field AS col, v.field2 AS col2 from json_test t 
  CROSS JOIN LATERAL (
    SELECT cast(json_parse(t.col) as ROW(field varchar, field2 varchar)) as v
  );

Please note that due to Hive's AS clause it's possible to extract field from JSON as column col.

Coral transform hive function “regexp_extract” to trino by introduce a new function "hive_pattern_to_trino"

Coral transform the following hive sql to trino

select regexp_extract('1a 2b 14m', '\d+', 1);

result:

select regexp_extract('1a 2b 14m', "hive_pattern_to_presto"('\d+'), 1);

But trino not have the function name of "hive_pattern_to_presto", then the query will failed with exception:

Function 'hive_pattern_to_trino' not registered

So, why you introduce the "hive_pattern_to_trino" function? Do this function is implement in linkedin internal?

Hive -> RelNode representation for Lateral Views has LogicalJoin as a RelNode

For a given Hive SQL with Lateral View, the RelNode representation generated has LogicalCorrelate / LogicalJoin depending on input SQL's syntax.

For Input hive SQL - SELECT col FROM (SELECT ARRAY('a1', 'a2') as a) LATERAL VIEW EXPLODE(a) a_alias AS col
RelNode representation is -

LogicalProject(col=[$1])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalProject(a=[ARRAY('a1', 'a2')])
      LogicalValues(tuples=[[{ 0 }]])
    HiveUncollect
      LogicalProject(col=[$cor0.a])
        LogicalValues(tuples=[[{ 0 }]])

For Input SQL - SELECT arr.alias FROM foo LATERAL VIEW EXPLODE(ARRAY('a', 'b')) arr as alias
RelNode representation -

LogicalProject(alias=[$3])
  LogicalJoin(condition=[true], joinType=[inner])
    LogicalTableScan(table=[[hive, default, foo]])
    HiveUncollect
      LogicalProject(col=[ARRAY('a', 'b')])
        LogicalValues(tuples=[[{ 0 }]])

Both translations should result only in LogicalCorrelate in the RelNode representation.

Hive to Trino Sort by is not supported

Coral-Schema alias output upper-cased schema even if I explicitly alias to lower case

To reproduce this issue:

add this test in ViewToAvroSchemaConverterTests and run it:

  @Test
  public void foo() {
    String viewSql = "CREATE VIEW v AS SELECT id as id FROM basecomplex";
    TestUtils.executeCreateViewQuery("default", "v", viewSql);

    ViewToAvroSchemaConverter viewToAvroSchemaConverter = ViewToAvroSchemaConverter.create(hiveMetastoreClient);
    Schema actualSchema = viewToAvroSchemaConverter.toAvroSchema("default", "v", false, false);

    System.out.println(actualSchema);
  }

the output will be:

{"type":"record","name":"v","namespace":"default.v","fields":[{"name":"Id","type":"int"}]}

Which shows the column name is Id rather than the explicit lower-cased alias: id.

Support LATERAL VIEW with other UDTFs besides `UNNEST EXPLODE`

At present, in ParseTreeBuilder.visitLateralViewExplode, we assume LATERAL VIEW is used with UNNEST EXPLODE only. It should be made more generic to support other UDTFs.

Shade trino-parser dependency of coral-trino

When bumping the Coral dependency version in Trino, we noticed that coral-trino has a new dependency on trino-parser.

One of our maintainers brought up the fact that the trino-parser is not designed to be depended on by other projects i.e. not guaranteed to maintain backward compatibility. In order to preempt future incompatibilities, would the folks in Coral consider shading the trino-parser dependency?

For reference, the conversation is here.

Failed to recognize predicate 'ARRAY'. Failed rule: 'identifier' in table or column identifier

I am having a problem with sql translation to Trino that results in a query with an identifier column.

  SELECT  numbers ,  animal ,  n ,  a 
FROM  ( 
VALUES 
  ( ARRAY [ 2 ,  5 ],  ARRAY [ 'dog' ,  'cat' ,  'bird' ]), 
  ( ARRAY [ 7 ,  8 ,  9 ],  ARRAY [ 'cow' ,  'pig' ]) 
)  AS  x  ( numbers ,  animals ) 
CROSS  JOIN  UNNEST ( numbers, 动物)  AS  t  ( n ,  a );

The way context is used in ParseTreeBuilder is error-prone and hard-to-maintain

While adding support for window functions, I noticed an issue in the current design of ParseTreeBuilder that has caused the following query:

  SELECT `a`, RANK() OVER (PARTITION BY `b` ORDER BY `c`) FROM `foo`

To be translated into a Calcite SqlNode tree as the following:

  SELECT `a`, RANK() OVER (PARTITION BY `b` ORDER BY `c`) FROM `foo` ORDER BY `c`

The extra "ORDER BY c" at the end of the query was due to the following issue in the current design of ParseTreeBuilder.

com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder has the following method that uses context to pass around information while converting the Hive ASTNode tree to Calcite SqlNode tree:

  private SqlNode processAST(ASTNode node, @Nullable Table hiveView) {
    ParseContext ctx = new ParseContext(hiveView);
    return visit(node, ctx);
  }

In the following function, it picks up "order by" clause by looking it up via the context:

  @Override
  protected SqlNode visitQueryNode(ASTNode node, ParseContext ctx) {
    ArrayList<Node> children = node.getChildren();
    checkState(children != null && !children.isEmpty());
    SqlNode cte = null;
    ParseContext qc = new ParseContext(ctx.getHiveTable().orElse(null));
    for (Node child : node.getChildren()) {
      ASTNode ast = (ASTNode) child;
      if (ast.getType() == HiveParser.TOK_CTE) {
        // Child of type TOK_CTE represents the "WITH" list
        /** See {@link #visitCTE(ASTNode, ParseContext) visitCTE} for the return value */
        cte = visit(ast, new ParseContext(null));
      } else {
        // The return values are ignored since all other children of SELECT query will be captures via ParseConext qc.
        visit(ast, qc);
      }
    }
    SqlSelect select = new SqlSelect(ZERO, qc.keywords, qc.selects, qc.from, qc.where, qc.grpBy, qc.having, null,
        qc.orderBy, null, qc.fetch);
    if (cte != null) {
      // Calcite uses "SqlWith(SqlNodeList of SqlWithItem, SqlSelect)" to represent queries with WITH
      /** See {@link #visitCTE(ASTNode, ParseContext) visitCTE} for details */
      return new SqlWith(ZERO, (SqlNodeList) cte, select);
    } else {
      return select;
    }
  }

So any kind of "ORDER BY" within the SELECT clause will be picked up.

To fix the issue, I would like to get rid of the use of Context but instead directly pass the orderBy SqlNode via the return value of the visit function.

Any thoughts @wmoustafa @funcheetah @antumbde ?

Incorrect schema for views with unnamed columns in the view SQL

If we create a Hive view with columns that are not explicitly named in the SQL, Hive with autogenerate column names for such columns persisting the view to the Metastore. E.g. Consider the following view

hive> CREATE VIEW v AS SELECT TRUE, lower('STR') FROM some_table;

Here is the persisted information in the metastore

hive> DESC FORMATTED v;
OK
# col_name            	data_type           	comment

_c0                 	boolean
_c1                 	string
.
.
.
# View Information
View Original Text: 	SELECT TRUE, lower('STR') FROM some_table
View Expanded Text: 	SELECT TRUE, lower('STR') FROM `some_table`

Trying to retrieve view schema using coral-schema for this view returns EXPR_0 and EXPR_1 as column names, which I assume are auto generated by Calcite. I think we should respect the column names provided by Hive here i.e. _c0 and c1 since they have been persisted as column metadata in the metastore.

Inclusive language

Can you address a couple of instances of language in the project where the wording can be made more inclusive? (example: whiltelist -> allowlist). We are trying to make the code in all LinkedIn projects more inclusive. Could you please examine whether or not these need to be updated, and make the changes? For suggested replacements see go/inclusivelanguage or google. THANK YOU!

Term	URL
Blacklist	https://github.com/linkedin/coral/blob/b436653fe94c1b6bab30fcf7c9fd1869e2461afd/coral-spark/src/main/java/com/linkedin/coral/spark/UnsupportedHiveUDFsInSpark.java

Translating Expressions through Coral

Can Coral provide stable APIs to translate expressions from hive to Presto through coral?

Usecase (As requested by Praveen Krishna):

Presto allows connectors to provide row-filters in terms of expressions. It's possible that they're defined in hiveql, and presto needs to read and transform them before applying. For this purpose, having the ability to translate expressions directly is useful.

for example, Presto can be configured to say that for queries by user U on on table tpch.tiny.orders , exclude rows for which orderKey is >= 10. This is defined as an expression written as orderKey < 10 in the configuration. Currently, Presto requires this expression to be in a form that Presto can understand. If coral allows hive --> presto translation of such expressions, then they could be expressed as hive expressions and translated at runtime.

Trino SqlParser doesn't accept backquotes "`" from trino2rel expanded views

A repeated error came up during testing of TrinoToRelConverter, coral.shading.io.trino.sql.parser.ParsingException: line 1:8: backquoted identifiers are not supported; use double quotes to quote identifiers. The issue is that during the Trino to Rel conversion, if we expand the view, backquotes are added and the query is then rejected by Trino SqlParser.

my SQL has "cross join" and "left semi join" , I need to parse it, but this not support.

coral/coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/parsetree/ParseTreeBuilder.java

Line 332 in 65f1d0c

protected SqlNode visitCrossJoin(ASTNode node, ParseContext ctx) {

Add support for DATE function in Hive

Below SQL throws an exception

SELECT date('2021-01-02') as a

com.linkedin.coral.hive.hive2rel.functions.UnknownSqlFunctionException: Unknown function name: date

	at com.linkedin.coral.hive.hive2rel.functions.HiveFunctionResolver.tryResolve(HiveFunctionResolver.java:124)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunctionInternal(ParseTreeBuilder.java:572)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunction(ParseTreeBuilder.java:563)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunction(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:122)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelectExpr(ParseTreeBuilder.java:580)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelectExpr(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:98)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelect(ParseTreeBuilder.java:605)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelect(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:95)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitInsert(AbstractASTVisitor.java:522)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:86)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitQueryNode(ParseTreeBuilder.java:733)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitQueryNode(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:66)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitNil(ParseTreeBuilder.java:739)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitNil(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:47)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.processAST(ParseTreeBuilder.java:176)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.process(ParseTreeBuilder.java:168)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.processSql(ParseTreeBuilder.java:161)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilderTest.convert(ParseTreeBuilderTest.java:145)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilderTest.testConvertAndValidate(ParseTreeBuilderTest.java:127)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
	at org.testng.TestRunner.privateRun(TestRunner.java:758)
	at org.testng.TestRunner.run(TestRunner.java:613)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
	at org.testng.SuiteRunner.run(SuiteRunner.java:240)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:87)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1170)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1095)
	at org.testng.TestNG.run(TestNG.java:1007)
	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)

Relates to trinodb/trino#8789

Can Presto SQL be converted to Hive SQL

Can Presto SQL be converted to Hive SQL？

Does not support string splicing

We have a SQL that uses || for string splicing, just like select'a' ||'b' ，but coral does not support this syntax.Can someone help support this feature?

Escape characters are not handled correctly in regular expressions while convert Hive SQL to Trino SQL

Coral version: 2.0.86
SQL: select '2022-01-01' rlike '^\\d{4}-\\d{2}-\\d{2}$|^\\d{4}-\\d{2}$', return true in hive
However coral's translation result is SELECT "REGEXP_LIKE"('2022-01-01', '^\\d{4}-\\d{2}-\\d{2}$|^\\d{4}-\\d{2}$') FROM (VALUES (0)) AS "t" ("ZERO"), return false in trino, but we know the expected translation result is SELECT "REGEXP_LIKE"('2022-01-01', '^\d{4}-\d{2}-\d{2}$|^\d{4}-\d{2}$') FROM (VALUES (0)) AS "t" ("ZERO") (the double backslash should be translate to single backslash).

Is i using it wrong or there is a BUG ?

Extract shared classes between coral-hive and coral-trino to coral-common

After adding to trino2rel, there are a few classes that both coral-hive and coral-trino use that can now be put in coral-common instead such as DaliOperatorTable.

These changes will be also helpful when adding future support for other SqlLanguage2rel conversions.

Not supported hive sql which contains windowfunc

Not supported hive sql which contains windowfunc,

exception:
Unhandled Hive AST token TOK_WINDOWSPEC, tree:

`
TOK_WINDOWSPEC

TOK_PARTITIONINGSPEC

  TOK_DISTRIBUTEBY

     TOK_TABLE_OR_COL

          c1

  TOK_ORDERBY

     TOK_TABSORTCOLNAMEDESC

        TOK_TABLE_OR_COL

           c2

for example:
column | type
c0 | string
c1 | string
c2 | int

select c0, c1, c2, lead(c0, 1) over ( PARTITION BY c1 ORDER BY c2 des ) c3 FROM hive.tablex

SELECT c0, c1, ROW_NUMBER() over ( PARTITION BY c0, ORDER BY c1 desc ) rw FROM hive.tablex

Create unit tests to test dynamic UDF resolution

Currently dynamic UDF resolution is only testable through integration tests as it relies on communicating with an Artifact Management Repository, typically an external service. At this point, integration tests are not open source either. This issue is to track creating a unit test that tests this functionality by creating a local artifact management repository and leveraging it in unit tests.

Translation of NULL column from Hive to Trino

The Hive view is defined as
CREATE VIEW v1 AS SELECT NULL AS c1 FROM t

When using it in Trino it throws an error
Unknown type 'NULL' for column 'c1' in view: hive.default.v1

It looks like Hive has the type void for NULL, while Trino doesn't and still enforces a type for every column.

A workaround is to modify the Hive view definition by adding an explicit cast (to whatever valid type in both Hive and Trino)
CREATE VIEW v1 AS SELECT CAST(NULL AS INT) AS c1 FROM t
This will be accepted by Trino and still works in Hive.

Regarding the translation from Hive to Trino, does it make sense to convert a plain NULL column into a CAST?

Replace UDFTransformer with OperatorTransformer in all use cases

OperatorTransformer is a re-implementation of UDFTransformer functionality, but at the SQLNode level.

impala/redshift support

Remove catalog name from the coralRelNode TableScan

Currently during hive -> spark/trino Sql translations, coral creates a relNode plan. The tableScan relNode has a representation: LogicalTableScan(table=[[hive, dbName, tableName]]).
When this relNode is translated to a SqlNode, "hive" (catalog name) has to be removed explicitly.

Filing this issue to investigate if introducing the catalog name in the relNode can be avoided.

(Release 1.0.25) Hive->Presto query translation fails on struct field access

Translation of struct<> field references to Presto fails with 1.0.25 (released 4 days ago).

Sample:

    public static void main(String[] args) throws Exception {
        Map<String, Map<String, List<String>>> localMetaStore = new HashMap<>();

        Map<String, List<String>> tables = new HashMap<>();
        tables.put("bar", Arrays.asList("col|struct<i:int,s:string>"));

        localMetaStore.put("foo", tables);

        HiveToRelConverter converter = HiveToRelConverter.create(localMetaStore);
        RelToPrestoConverter presto = new RelToPrestoConverter();

        RelNode rel = converter.convertSql("select col from foo.bar");
        System.out.println(presto.convert(rel));

        // Referencing 'i' in struct 'col' fails
        rel = converter.convertSql("select col.i from foo.bar");
        System.out.println(presto.convert(rel));
    }

Stacktrace:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
SELECT "col"
FROM "foo"."bar"
Exception in thread "main" org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: Table 'col' not found
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
        at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:834)
        at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:819)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:4867)
        at org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:370)
        at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5759)
        at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5744)
        at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:5351)
        at com.linkedin.coral.hive.hive2rel.HiveSqlValidator.expand(HiveSqlValidator.java:61)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:447)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelectList(SqlValidatorImpl.java:4104)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3392)
        at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
        at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1005)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:965)
        at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:940)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:647)
        at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:57)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.toRel(HiveToRelConverter.java:126)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.convertSql(HiveToRelConverter.java:81)
        at com.vmware.cb.product.metrics.Main.main(Main.java:51)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Table 'col' not found
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
        at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:572)
        ... 23 more

Support for other Hive versions

Currently Coral is using Hive 1.0, would there be any plans to support other Hive versions?

This is causing issues with systems using Hive 2.1.0 and above, for example: several date functions such as date_add had their function signatures changed in 2.1.0 release. Reference: https://cwiki.apache.org/confluence/display/hive/languagemanual+udf

Supporting CTAS queries for Hive to SparkSql translations

Currently Coral only supports translations for select kinds of queries. If we want to translate CTAS queries, it throws an error during AST node to SqlNode conversion.