google / zetasql Goto Github PK

ZetaSQL - Analyzer Framework for SQL

License: Apache License 2.0

Python 2.71% C++ 87.56% Java 6.10% Yacc 1.48% Lex 0.20% Starlark 1.91% C 0.03% Dockerfile 0.01%

zetasql's Introduction

ZetaSQL - Analyzer Framework for SQL

ZetaSQL defines a language (grammar, types, data model, and semantics) as well as a parser and analyzer. It is not itself a database or query engine. Instead it is intended to be used by multiple engines wanting to provide consistent behavior for all semantic analysis, name resolution, type checking, implicit casting, etc. Specific query engines may not implement all features in the ZetaSQL language and may give errors if specific features are not supported. For example, engine A may not support any updates and engine B may not support analytic functions.

ZetaSQL Language Guide

ZetaSQL ResolvedAST API

ZetaSQL BigQuery Analysis Example

Status of Project and Roadmap

This codebase is being open sourced in multiple phases:

Parser and Analyzer Complete
Reference Implementation In Progress
- Base capability Complete
- Function library In Progress
Compliance Tests Complete
- includes framework for validating compliance of arbitrary engines
Misc tooling
- Improved Formatter Complete

Multiplatform support is planned for the following platforms:

Linux (Ubuntu 20.04 is our reference platform, but others may work).
- gcc-9+ is required, recent versions of clang may work.
MacOS (Experimental)
Windows (version TDB)

We do not provide any guarantees of API stability and cannot accept contributions.

Flags

ZetaSQL uses the Abseil Flags library to handle commandline flags. Unless otherwise documented, all flags are for debugging purposes only and may change, stop working or be removed at any time.

How to Build

ZetaSQL uses bazel for building and dependency resolution. After installing bazel (check .bazelversion for the specific version of bazel we test with, but other versions may work), simply run:

bazel build ...

If your Mac build fails due the python error ModuleNotFoundError: no module named 'google.protobuf', run pip install protobuf==<version> to install python protobuf first. The protobuf version can be found in the zetasql_deps_step_2.bzl file.

How to add as a Dependency in bazel

See the (WORKSPACE) file, as it is a little unusual.

With docker

TODO: Add docker build instructions.

Example Usage

A very basic command line tool is available to run simple queries with the reference implementation: bazel run //zetasql/tools/execute_query:execute_query -- "select 1 + 1;"

The reference implementation is not yet completely released and currently supports only a subset of functions and types.

Differential Privacy

For questions, documentation and examples of ZetaSQLs implementation of Differential Privacy, please check out (https://github.com/google/differential-privacy).

Versions

ZetaSQL makes no guarantees regarding compatibility between releases. Breaking changes may be made at any time. Our releases are numbered based on the date of the commit the release is cut from. The number format is YYYY.MM.n, where YYYY is the year, MM is the two digit month, and n is a sequence number within the time period.

License

Apache License 2.0

Support Disclaimer

This is not an officially supported Google product.

zetasql's People

Stargazers

Watchers

Forkers

mbrukman dengziming chubbymaggie hustnn mrotteveel sabino linecode offgrid-konrad apstndb yupbank jianlirong tangingw krisskross shollyman wenxiaozxw aocsa mcunha discord beckjake rubik-ai shuod viirya hunglethanh9 igordzreyev vdt beamjl fuath emuhedo hhy5277 kamyu104 w7yuu shaunstanislauslau b-xiang averroes luohao yaozhaoming scsldb prajamani threatintel-c mo36924 chenfengli1985 lucyio ryanmcdowell martinsahlen dictav aljoscha ambition119 zhaoyaogit hiloboy0119 emaxerrno vglocus georgewfraser duanshuaimin ptangphao sinhasantos padod gbeanvamp janhicken csenel videoamp liancheng zeta1999 cejutue feiyunwill matts966 jshute111 devoteamgcloud hamidmoq kabeer27 xuqianjin-stars traintravel mingen-pan blockspacer diwakarvish morganwu277 getdataset orhanuysal-google fadimounir isabella232 antfitch romainr neotim hypi-universe andyg24 jasonginus cdx08222028 fbird2020 apitestsx yymm guoqingiskid org-mars hitum-dev jacobstocklass fumiyukikato fivetran aceforeverd sambacha 4paradigm xiaozan-dev allpedroza

zetasql's Issues

Extract created table/view table name without building table catalog

Hi.

I am looking for a way to extract the table/view name from table/view create statements without building a table catalog and analyzing the statement.

For example, consider the following:

String sql = "create or replace table project2.dataset2.table2 as SELECT column1 from project1.dataset1.table1";
List<List<String>> tables = Analyzer.extractTableNamesFromStatement(sql, ZetaSqlAnalyzerFactory.getAnalyzerOptions());

With the above, I am able to retrieve the names of the tables involved in the query, in this case project1.dataset1.table1. What I am looking for is how to extract the created table project2.dataset2.table2 without building the table catalog and analyzing the statement. I can retrieve them building a table catalog and analyzing the statement, my issue is that I am not always able to build the table catalog, so I am looking for a way to bypass this step.

Best regards

Reconstruct a subquery from AST without formatting

I parse a query to an AST and would like to print out one of its child nodes to its original string. I try to use Unparse(child_node) but it will format a new string. Is there a way to print its original form?

Here is my basic idea:

ParseStatement(sql, options.GetParserOptions(), &parser_output)
auto child_node =parser_output->statement()->child(0)->child(0)->child(1);
std::cout << Unparse(child_node); // this method will format the string

For example, the original sql is select * from x join y on x.id = y.id. And I hope to extract its FROM clause, which should be x join y on x.id = y.id in its original form.

FORMAT() is not a recognized function

I get an error whenever I try to parse a query which makes use of the FORMAT function. Ive added all available functions to the catalog (even though i cannot see one for formatting). Has anyone else had and solved this issue?

Dealing with scripting in big query sql

Assume I have something like following in .sql file:

DECLARE var1 STRING;
DECLARE var2 INT64;
SET var1 = (
SELECT GUID
FROM project.dataset.table1
WHERE some_conditions
LIMIT 1
);
SET var2 = (
SELECT A.B
FROM project.dataset.table2
WHERE some_conditions
LIMIT 1
);

INSERT project.dataset.table3
VALUES (value1, value2, value3);

Based on my knowledge that ZetaSQL is not supporting scripting, so I am wondering what's the best way to deal with scripting while parsing the .sql files with ZetaSQL Java implementation.

Thanks,

buildStatement of a ResolvedStatement doesn't work when there is a tree of catalogs

I am trying out a very simple case where I build the sql statement again after I got the resolvedStatement. Below some test code:

public class Main {

    private static String SQL = "SELECT * FROM test_dataset.test_table;";

    public static void main(String[] args) {
        SimpleCatalog catalog = new SimpleCatalog("catalog");
        SimpleCatalog datasetCatalog = new SimpleCatalog("test_dataset");
        SimpleTable simpleTable = new SimpleTable("test_table");
        simpleTable.addSimpleColumn("colA", TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_STRING));
        simpleTable.addSimpleColumn("colB", TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_STRING));
        simpleTable.addSimpleColumn("colC", TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_STRING));
        datasetCatalog.addSimpleTable(simpleTable);
        catalog.addSimpleCatalog(datasetCatalog);

        AnalyzerOptions options = new AnalyzerOptions();
        ResolvedNodes.ResolvedStatement resolvedStatement = Analyzer.analyzeStatement(SQL, options, catalog);

        String statementString = Analyzer.buildStatement(resolvedStatement, catalog);

        System.out.println(statementString);
    }
}

I get the following error:

FATAL  statusor.cc : 38 : Attempting to fetch value instead of handling error NOT_FOUND: Table not found: test_table not found in catalog catalog

When I use datasetCatalog in the Analyzer.buildStatement(), I don't get an error but obviously the dataset info is not present.

Why not use antlr4 to describe the Grammar?

Missing SqlBuilder Java Classes

Currently, the Java classes provide the possibility to analyze a SQL statement and get a resolved AST. However, the it is not possible to create a SQL statement from the resolved AST, because the required Java classes are missing (even though the corresponding C++ classes are available), i.e. SqlBuilder.java and its dependencies.

Latest docker build script for zetaSQL 2020.04.1

Description

I've been trying to create a parser in the same manner that @apstnb built the format server (https://github.com/apstndb/zetasql-format-server). In particular, I'm interested in using the function Run programmatically (and not from the command-line) (It exists inside /experimental/execute_query.cc). In this way, I'd like a docker container that exposes the Run method via a small golang server and I can push a sql statement and get the required AST back.

Has anyone here created a Dockerfile that successfully builds and compiles the latest zetaSQL, similar to https://github.com/apstndb/zetasql-format-server/, but using the latest zetasql (2020.04.1), and not the version in WORKSPACE of the above repo, given below:

http_archive(
   name = "com_google_zetasql",
   strip_prefix = "zetasql-2019.07.1",
   urls = [
      "https://github.com/google/zetasql/archive/2019.07.1.tar.gz",
   ],
   sha256 = "88871fe511cce67e86321f9eb69cd8f81218a9415f41dff36b066a23b483bf06"
)

The request for help

I've found it quite hard to build the latest zetaSQL using the example in zetasql-format-server as a base. Any help with formulating a Dockerfile that successfully exposes an endpoint that takes a SQL and returns an AST of this SQL would be greatly appreciated.

TVFSignature.java lacks implementation

See title.

/** TODO: Add implementation. */
public final class TVFSignature implements Serializable {

com.google.zetasql.SqlException: Syntax error: Unexpected "-" [at 1:34]

Hi, I have started to use Dataflow SQL. When I run following query ı have trouble with date whose format is YYYY-MM-DD.

Executing SQL query:
select * from bigquery.table.coca-cola-data-lake.SPARC.sparc_output_daily where transfer_date = "2019-11-10"

Java Wrapper getting error: Statement not supported: CreateTableStatement

Hi, I am using the ZetaSQL Java wrapper to extract the tables used in a SQL query. When the query contains a "Create table" statement, I am getting a Statement not supported: CreateTableStatement.

Is there any way for me to enable "create table" statements when analyzing the query?

Code sample to reproduce:

String sql = "create or replace table project2.dataset2.table2 as SELECT column1 from project1.dataset1.table1";
SimpleCatalog catalog = new SimpleCatalog("tableCatalog");
SimpleCatalog catalProject = catalog.addNewSimpleCatalog("project1");
SimpleCatalog catalDataset = catalProject.addNewSimpleCatalog("dataset1");
SimpleColumn column1 = new SimpleColumn("project1.dataset1.table1", "column1", TypeFactory.createSimpleType(ZetaSQLType.TypeKind.TYPE_STRING));
SimpleTable simpleTable1 = new SimpleTable("table1", Arrays.asList(column1));
catalDataset.addSimpleTable(simpleTable1);
catalog.addZetaSQLFunctions(new ZetaSQLBuiltinFunctionOptions());
LanguageOptions languageOptions = new LanguageOptions();
languageOptions.enableMaximumLanguageFeatures();
AnalyzerOptions options = new AnalyzerOptions();
options.setPruneUnusedColumns(true);
options.setLanguageOptions(languageOptions);
Analyzer analyzer = new Analyzer(options, catalog);
analyzer.analyzeStatement(sql);

Running the above throws: INVALID_ARGUMENT: Statement not supported: CreateTableStatement [at 1:1]

Thank you

How to use?

Could you easily explain how to use?
How to connect the parser to my Command line interface
Is there any sample?
Thanks

Qualified function calls result in "Function not found"

When I parse this query with SqlAnalyzer::analyzeNextStatement, it succeeds:

CREATE FUNCTION foo.bar.baz() AS (1); SELECT baz();

But this query fails:

CREATE FUNCTION foo.bar.baz() AS (1); SELECT foo.bar.baz();

It looks like singleton and non-singleton paths are handled differently, and non-singleton paths don't work.

zetasql/zetasql/public/catalog.cc

Lines 120 to 128 in 3e5d33d

 if (path.size() > 1) { 

 Catalog* catalog = nullptr; 

 ZETASQL_RETURN_IF_ERROR(GetCatalog(name, &catalog, options)); 

 if (catalog == nullptr) { 

 return FunctionNotFoundError(path); 

 } 

 const absl::Span<const std::string> path_suffix = 

 path.subspan(1, path.size() - 1); 

 return catalog->FindFunction(path_suffix, function, options);

No matching signature for function CONCAT for argument types: DATE, DATE. How to add signature?

Hi there,
I have a BigQuery query that I am trying to parse using the ZetaSql Java wrapper. The query contains a CONCAT function call with two date types as argument, however, when I try to parse it I get the error:

INVALID_ARGUMENT: No matching signature for function CONCAT for argument types: DATE, DATE. Supported signatures: CONCAT(STRING, [STRING, ...]); CONCAT(BYTES, [BYTES, ...]) [at 1:8]

example of the query:

select concat(current_date(), current_date())

How can I add this signature to the function?
Thank you

create temp function not supported

Are creating temp functions in the SQL statement not supported?
I used Analyzer.extractTableNamesFromStatement for this query:

CREATE TEMPORARY FUNCTION plusone(value INT64)
RETURNS INT64
LANGUAGE js AS """
  return parseInt(value) + 1;
""";

SELECT plusone(1) AS two;

gives an error
com.google.zetasql.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Syntax error: Unexpected keyword SELECT

Build crashing

I'm trying to build inside a ubuntu 18.04 docker container. My goal is to produce a working VSCode container for working on zetasql. The build keeps crashing with messages like:

ERROR: /workspaces/zetasql/zetasql/local_service/BUILD:101:1: Building zetasql/local_service/liblocal_service_proto-speed.jar (1 source jar) failed: Worker process quit or closed its stdin stream when we tried to send a WorkRequest:

---8<---8<--- Exception details ---8<---8<---
java.io.IOException: Stream closed
	at java.base/java.lang.ProcessBuilder$NullOutputStream.write(Unknown Source)
	at java.base/java.io.OutputStream.write(Unknown Source)
	at java.base/java.io.BufferedOutputStream.flushBuffer(Unknown Source)
	at java.base/java.io.BufferedOutputStream.write(Unknown Source)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3003)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.write(CodedOutputStream.java:2935)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeLazy(CodedOutputStream.java:2954)
	at com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1331)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeBytesNoTag(CodedOutputStream.java:2751)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeBytes(CodedOutputStream.java:2724)
	at com.google.devtools.build.lib.worker.WorkerProtocol$Input.writeTo(WorkerProtocol.java:223)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2800)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2775)
	at com.google.devtools.build.lib.worker.WorkerProtocol$WorkRequest.writeTo(WorkerProtocol.java:1009)
	at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:98)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:330)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:172)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:121)
	at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:225)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:123)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:88)
	at com.google.devtools.build.lib.actions.SpawnActionContext.beginExecution(SpawnActionContext.java:41)
	at com.google.devtools.build.lib.exec.ProxySpawnActionContext.beginExecution(ProxySpawnActionContext.java:60)
	at com.google.devtools.build.lib.actions.SpawnContinuation$1.execute(SpawnContinuation.java:80)
	at com.google.devtools.build.lib.rules.java.JavaCompileAction$JavaActionContinuation.execute(JavaCompileAction.java:495)
	at com.google.devtools.build.lib.rules.java.JavaCompileAction.beginExecution(JavaCompileAction.java:315)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:123)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:832)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:966)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:938)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:114)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:78)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:562)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:710)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:256)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:450)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:387)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
---8<---8<--- End of exception details ---8<---8<---

How can I troubleshoot this?

MacOS support

I know that MacOS support is on the roadmap. But I wanted to make a separate ticket out of it to create some visibility.

The thing is that ZetaSQL is now supported in Apache Beam and it's hard to contribute to Beam code that supports Zeta when running it on a Mac.

Timezone issue in AnalyzerOptions

Hiya,

Thanks for releasing this publicly! I've been digging through the codebase for the past few weeks. My colleague and I noticed this small "bug" in analyzer.cc, when initializing analyzerOptions.

CHECK(LoadTimeZone("America/Los_Angeles", &default_timezone_));

How to change statement

After parsing some query

String sql = "SELECT column1 FROM dataset1.table1 AS t1";
ResolvedQueryStmt statement = (ResolvedQueryStmt) Analyzer.analyzeStatement(sql, options, catalog);

I need to change the statement for example by

adding new column t1.column2 AS c2
adding new condition for where clause c2 = 1

Then by this code

String sql = Analyzer.buildStatement(statement, catalog);

I would like to receive changed query

SELECT column1, t1.column2 AS c2 FROM dataset1.table1 AS t1 WHERE c2 = 1

Is it possible?
Where can I found an examples in Java?
Thanks.

[Analyzer] No way to analyze scripts

It seems that ZetaSQL supports whole-script parsing (Parser::ParseScript), which, significant for my purpose, allows analysis of scripts with toplevel DECLARE statements, as is also possible within BigQuery.
But the Analyzer API (in C++ as well as Java) does not provide an interface to this functionality, so I'm stuck with having to wrap my scripts with a superfluous BEGIN/END block.
It'd be nice if this where a supported feature.

CreateTableStament, InsertStatement, CreateModelStatement not supported

When trying to parse a CREATE TABLE, INSERT or CREATE MODEL statement (using the com.google.zetasql.Analyzer Java class), we get

Statement not supported: CreateTableStatement [at 2:1]
com.google.zetasql.SqlException: Statement not supported: CreateTableStatement [at 2:1]

Statement not supported: InsertStatement [at 2:1]
com.google.zetasql.SqlException: Statement not supported: InsertStatement [at 2:1]

Statement not supported: CreateModelStatement [at 1:1]
com.google.zetasql.SqlException: Statement not supported: CreateModelStatement [at 1:1]

Do you guys plan to support these statement? If so, do you have an eta?

ML functions are missing

Signatures of TVFs for ML (ML.PREDICT, ML.EVALUATE,...) are not provided by the corresponding java classes, i.e.. ZetaSQLFunction.FunctionSignatureId.values() does not contain these function signatures.

Analyzing statements with named parameters

If I try to analyze a statement like this:

select @extract_time_str as SyncExtractTime

I get this error: Query parameter 'extract_time_str' not found.
Is there any way to handle named parameters?

How to set Analyzer not to accept "UNION ALL" in query statement?

Hello,

I am using Analyzer and have:

analyzer.analyzeStatement(query)

How can I make it not to accept UNION ALL in query statement?

InvalidArgument: Table not found: iden

Where to provide database configuration?

Example documentation or code

Hey
I work with BigQuery at work, and I have been trying to analyze BigQuery sql statements. I want to parse them and create the AST trees. I got it working using the https://github.com/apstndb/zetasql-sandbox , but I run into problems creating the catalog. Do you guys have any example documentation on creating the catalog from BigQuery API or maybe even just mocking it

Thanks
Javier

docker build scripts.

Scripts for building under docker, preserving symlinks and permissions

docker_build.sh

#!/bin/sh

user=$(id -u -n)
group=$(id -g -n)
uid=$(id -u)
gid=$(id -g)

docker  build -t zetasql-build                  \
        --build-arg uid=$uid                    \
        --build-arg gid=$gid                    \
        --build-arg user=$user                  \
        --build-arg group=$group                \
        --build-arg home=$HOME                  \
        --build-arg pwd=`pwd`                  \
        docker/

## set this to 1 or 2 less than the cores...  
docker run --cpus 8  --rm                                               \
       --mount type=bind,source=$HOME/.cache,destination=$HOME/.cache   \
       --mount type=bind,source=`pwd`,destination=`pwd`               \
       -i -t                                                            \
       zetasql-build

docker/Dockerfile

FROM ubuntu:latest
## Should be combined when stable....
RUN apt-get -y update
RUN apt-get -y install wget pkg-config zip g++ zlib1g-dev unzip python3 git make bash-completion python apt-utils
RUN wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel_0.26.1-linux-x86_64.deb
RUN dpkg -i bazel_0.26.1-linux-x86_64.deb
RUN apt-get -y install  openjdk-8-jdk-headless
RUN apt-get -y install python3-distutils
RUN apt-get install --fix-broken

## PROVIDED FROM COMMAND LINE
ARG uid
ARG gid
ARG user
ARG group
ARG home
ARG pwd

ENV HOME=${home}
## users group is already present  remove before adding
RUN groupdel users &&  groupadd -g ${gid} ${group} &&  useradd  -M -d $HOME -g ${gid} -u ${uid}  ${user}

USER ${user}
WORKDIR ${pwd}

CMD bazel build zetasql/...
#CMD /bin/bash

SimpleModel.java lacks setter methods

It is not possible to create a catalog containing ML Models (using com.google.zetasql.SimpleModel) because the class does not provide the setters needed. Consequently, it is not possible to validate/analyze queries using models.

ParseLocationRange is not populated for all Resolved Nodes

Hello,

We are looking to use ParseLocationRange from the proto objects. However these values are only populated for a small portion of resolved nodes. For example,

ResolvedScan objects may include the source location set to include the query as expected or it could be missing.
For a ResolvedTVFScan, location range only includes function name instead of whole expression.
For the remainder of objects, only ResolvedLiteral and ResolvedFunctionArgument seem to be correctly populated.

Is it possible to extract source location for the remainder of the ResolvedNode objects?

Thank you.

AS statements

Hi,

Ive been using the zetasql's Resolved Statement to analyze sql statements and extract data lineage. The one tyoe of visitor i cannot find seems to be one which would resolve "x as y" statements that do not have the "WITH" keyword up front. Have I just not stumbled upon the correct visitor, or is this functionality not implemented?

Best,
Kira

ZetaSQL Java 2020.09.1 Release

Hi can I have a update note for ZetaSQL Java 2020.09.1 Release?

How to extract connection in AS statements

Hi,

I'm trying to parse the following query through the java implementation:

WITH test_table AS (
  SELECT
  column_source as column_alias
  FROM `table_in_catalog`
)

SELECT column_alias FROM test_table

But I'm unable to find a visitor that connects column_source with column_alias.
I know that you can get the name-alias relationship with resolvedoutputcolumn, but that only works for the output columns of the whole query, not with the ones within CTE table...

Is there any way to extract this connection?

Thank you very much!

ResolvedNodes.ResolvedTableScan shouldn't require forSystemTimeExpr to be not null

Generated code validates that forSystemTimeExpr isn't null:

protected void validate() {
    super.validate();
    Preconditions.checkArgument(this.table != null, "table must be set");
    Preconditions.checkArgument(this.forSystemTimeExpr != null, "forSystemTimeExpr must be set");
    Preconditions.checkArgument(this.alias != null, "alias must be set");
}

While NULL is a valid case if system time functionality isn't used.

Querying across partitions

In BigQuery I can write statements like this:

select * from dataset1.table_* where _TABLE_SUFFIX = '201910'

To dynamically query multiple partitions/tables.
This fails as an unrecognized table in the analyzer. Is there any support for this syntax in zetasql?

How to get a parse tree?

Is there any code example to show how to use this library? How can I get a parse tree for a sql query?

Function equal not found in a join statement

Analyzer.analyzeStatement gives the following error when giving a sql statement with an inner join:

com.google.zetasql.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Function not found: `$equal` [at 1:153]

The sql statement:

SELECT tb.* FROM `project-id`.test_dataset.test_view as vw INNER JOIN `project-id`.test_dataset.test_table as tb ON tb.colA = vw.colA;

I also add all the language features this way:

LanguageOptions languageOptions = new LanguageOptions();
languageOptions.enableMaximumLanguageFeatures();
AnalyzerOptions options = new AnalyzerOptions();
options.setLanguageOptions(languageOptions);

ResolvedStatement rs = Analyzer.analyzeStatement(sql, options, catalog);

The code works without a join.

What do I do wrong?

Implicit aliases in FROM statement not working

Hello, I have the following query form

select column
  from `projcet1.dataset1.tablename1`
  left join dataset2.tablename2
  on column = tablename2.column
  ...

From the documentation (https://github.com/google/zetasql/blob/master/docs/query-syntax.md#implicit-aliases): "For path expressions, the alias is the last identifier in the path. For example, FROM abc.def.ghi implies AS ghi", therefore, I would expect an alias to be created with the name tablename2

However, when I run:

ResolvedNodes.ResolvedStatement statement = analyzer.analyzeStatement(sql);

I get the error, com.google.zetasql.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Unrecognized name: tablename2 - The same query executes successfully in BigQuery.
I have the following language options enabled:

catalog.addZetaSQLFunctions(new ZetaSQLBuiltinFunctionOptions());
LanguageOptions languageOptions = new LanguageOptions();
languageOptions.enableMaximumLanguageFeatures();
AnalyzerOptions options = new AnalyzerOptions();
options.setPruneUnusedColumns(true);
options.setLanguageOptions(languageOptions);
return new Analyzer(options, catalog)

Someone know if I have to enable some additional language settings, or how to achieve the expected behavior?
Thanks!

Table-valued functions are not supported

Table-valued functions are not supported [at 3:9]
com.google.zetasql.SqlException: Table-valued functions are not supported [at 3:9]

Do you plan to support table-valued functions? If so, do you have an eta?

Float formatting in SQLBuilder is locale specific

I am using the SQLBuilder to produce SQL from a ResolvedAST. The AST contains very small floating point literals, e. g. 0.000146511, which are rendered as 146511E-4 by default.

It seems that the SQL Builder eventually uses standard C printf to produce a string for a given floating point, which will use a comma as decimal separator for certain locales, e. g. de_DE.UTF-8.

The above literal will produce the SQL expression (0, 146511) which is of type STRUCT<INT, INT> instead of FLOAT.

As a workaround, I set LC_NUMERIC=C, which fixes this behaviour.

Comments are stripped on SqlFormatter.formatSql()

Comments are stripped on SqlFormatter.formatSql().
Enabling include_comments is intended behavior to implement formatter.

ResolvedNodes.java doesn't seem to properly setFullName in TableRefProto

ResolvedNodes.java doesn't seem to properly setFullName in TableRefProto.

It causes errors in C++ code that relies on having fullName

[BUG] "java.lang.IllegalStateException: No ZetaSQL ClientChannelProvider loaded."

Hi,

I am getting this error

java.lang.IllegalStateException: No ZetaSQL ClientChannelProvider loaded.
 at com.google.zetasql.ClientChannelProvider.loadChannel(ClientChannelProvider.java:34)
 at com.google.zetasql.Client.getStub(Client.java:29)
 at com.google.zetasql.LanguageOptions.getDefaultFeatures(LanguageOptions.java:58)
 at com.google.zetasql.LanguageOptions.<init>(LanguageOptions.java:66)
 at com.google.zetasql.AnalyzerOptions.<init>(AnalyzerOptions.java:52)
 at com.google.zetasql.Analyzer.extractTableNamesFromStatement(Analyzer.java:183)

when running the following code

private void testZeta(String query) {
   try {
       List<List<String>> tableNames = Analyzer.extractTableNamesFromStatement(query);
   } catch (Exception e) {
       logger.error(e.getMessage());
   }
}

My pom.xml is like this

<dependency>
  <groupId>com.google.zetasql</groupId>
  <artifactId>zetasql-client</artifactId>
  <version>2020.09.1</version>
</dependency>
<dependency>
  <groupId>com.google.zetasql</groupId>
  <artifactId>zetasql-jni-channel-darwin</artifactId>
  <version>2020.09.1</version>
</dependency>

I tried zetasql-jni-channel instead of zetasql-jni-channel-darwin and got the same error. I am running it on Mac. Any idea why I get this error and how I can solve it?
Does this work on Macos?

SimpleColumn.java lacks public constructors

The constructors of SimpleColumn.java are package-protected although the class itself is public. Hence, the constructors should also be public.

Cannot JOIN TVF

zetasql version 2020.03.2.
Using zetasql-jni-channel-darwin.

If I try to JOIN the output of two tvfs (e.g., SELECT * FROM tvf(Table table) JOIN tvf(Table table) USING (id), I get the following errors (depending on if I use USING or ON):

Using USING:

Caused by: com.google.zetasql.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Column id in USING has incompatible types on either side of the join: INT64 and INT64 [at 1:71]

Using ON:

Caused by: com.google.zetasql.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Function not found: `$equal` [at 1:68]

How to reproduce:

public void testJoin() {

    LanguageOptions languageOptions = new LanguageOptions();
    languageOptions.enableLanguageFeature(
        ZetaSQLOptions.LanguageFeature.FEATURE_TABLE_VALUED_FUNCTIONS);

    AnalyzerOptions analyzerOptions = new AnalyzerOptions();
    analyzerOptions.setLanguageOptions(languageOptions);

    SimpleCatalog catalog = new SimpleCatalog("catalog");
    Analyzer analyzer = new Analyzer(analyzerOptions, catalog);

    FunctionArgumentType tableType =
        new FunctionArgumentType(ZetaSQLFunctions.SignatureArgumentKind.ARG_TYPE_RELATION);
    TableValuedFunction tvf =
        new TableValuedFunction.ForwardInputSchemaToOutputSchemaTVF(
            ImmutableList.of("tvf"),
            new FunctionSignature(tableType, ImmutableList.of(tableType), /* contextId= */ -1));
    catalog.addTableValuedFunction(tvf);

    SimpleTable table = new SimpleTable("table");
    table.addSimpleColumn("id", createSimpleType(ZetaSQLType.TypeKind.TYPE_INT64));
    catalog.addSimpleTable(table);

    try {
    analyzer.analyzeStatement(
        "SELECT * FROM "
        + "tvf(TABLE table) AS a JOIN "
        + "tvf(TABLE table) AS b USING (id)");
    } catch (com.google.zetasql.SqlException e) {
      // Column id in USING has incompatible types on either side of the join: INT64 and INT64
      e.printStackTrace();
    }

    try {
      analyzer.analyzeStatement(
          "SELECT * FROM "
          + "tvf(TABLE table) AS a JOIN "
          + "tvf(TABLE table) AS b ON (a.id = b.id)");
    } catch (com.google.zetasql.SqlException e) {
      // Function not found: `$equal`
      e.printStackTrace();
    }

  }

I've tried:

Using TableValuedFunction.ForwardInputSchemaToOutputSchemaTVF.
Using the output of analyzer.analyzeStatement("CREATE TABLE FUNCTION tvf(table ANY TABLE);") to build the TableValuedFunction object.
Joining tvf and table. Same errors as before.

Analytic functions not supported

I am trying to analyze a sql statement that include BigQuery functions lag and lead but am getting this error. Am I doing something wrong or is this not yet supported? I see that these functions exist in ZetaSQLFunction.java

16:20:29.885 [DEBUG] [TestEventLogger] zetasqltest > test FAILED
16:20:29.885 [DEBUG] [TestEventLogger]     com.google.zetasql.SqlException: Analytic functions not supported [at 6:8]
16:20:29.885 [DEBUG] [TestEventLogger]         at com.google.zetasql.Analyzer.analyzeStatement(Analyzer.java:60)
16:20:29.885 [DEBUG] [TestEventLogger]         at com.google.zetasql.Analyzer.analyzeStatement(Analyzer.java:46)
16:20:29.885 [DEBUG] [TestEventLogger]         at zetasqltest.test(zetasqltest.java:387)
16:20:29.885 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:20:29.885 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:20:29.885 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:20:29.885 [DEBUG] [TestEventLogger]         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:20:29.885 [DEBUG] [TestEventLogger]         at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
16:20:29.885 [DEBUG] [TestEventLogger]         at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
16:20:29.885 [DEBUG] [TestEventLogger]         at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
16:20:29.885 [DEBUG] [TestEventLogger]         at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
16:20:29.885 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
16:20:29.886 [DEBUG] [TestEventLogger]         at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
16:20:29.886 [DEBUG] [TestEventLogger]         at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:20:29.886 [DEBUG] [TestEventLogger]         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
16:20:29.887 [DEBUG] [TestEventLogger]         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
16:20:29.887 [DEBUG] [TestEventLogger]         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
16:20:29.887 [DEBUG] [TestEventLogger]         at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
16:20:29.887 [DEBUG] [TestEventLogger]         at java.base/java.lang.Thread.run(Thread.java:834)
16:20:29.887 [DEBUG] [TestEventLogger] 
16:20:29.887 [DEBUG] [TestEventLogger]         Caused by:
16:20:29.887 [DEBUG] [TestEventLogger]         com.google.zetasql.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Analytic functions not supported [at 6:8]
16:20:29.887 [DEBUG] [TestEventLogger]             at com.google.zetasql.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
16:20:29.887 [DEBUG] [TestEventLogger]             at com.google.zetasql.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
16:20:29.887 [DEBUG] [TestEventLogger]             at com.google.zetasql.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
16:20:29.887 [DEBUG] [TestEventLogger]             at com.google.zetasql.ZetaSqlLocalServiceGrpc$ZetaSqlLocalServiceBlockingStub.analyze(ZetaSqlLocalServiceGrpc.java:1063)
16:20:29.887 [DEBUG] [TestEventLogger]             at com.google.zetasql.Analyzer.analyzeStatement(Analyzer.java:58)
16:20:29.887 [DEBUG] [TestEventLogger]             ... 49 more

Failure on repeated JNI connection

Hi! I'm trying to run ZetaSQL analyzer from a Scala macro, to make some query validation at compile time. It works wonderfully on the first compilation, but when I try to change code (or just clean) and compile again it crashes with this error message:

[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/extension_set.cc:93] 
Multiple extension registrations for type "google.protobuf.FieldOptions", field number 68711883.

libc++abi.dylib: terminating with uncaught exception of type google::protobuf::FatalException: 
Multiple extension registrations for type "google.protobuf.FieldOptions", field number 68711883.

This can be reproduced without any macros, just doing a normal runtime call to the ZetaSQL API with the build tool (sbt) trying to reuse the same JVM. It works with forking JVM for each run, but it's not a solution for the original use case (calling ZetaSQL at compile time).

I tried a few things, including reimplementing JniChannelProvider in Scala to be able to localize the problem and potentially tweak its behavior. Unfortunately, I don't know what I could (safely) tweak to make it work. As far as I understood the problem is not in the JniChannelProvider itself, but in the JNI library being loaded twice. I thought that maybe I can avoid that, so I tried this:

extract the .dylib file and load the library from that stable location (in contrast to how cz.adamh.utils.NativeUtils extracts it to a new temp location every time), that led to

UnsatisfiedLinkError: Native Library [...]/liblocal_service_jni.dylib already loaded in another classloader
load it conditionally by checking if the classloader already has it (after the first run), that led me to

UnsatisfiedLinkError: static.experiment.SocketProvider.getSocketChannel()Ljava/nio/channels/SocketChannel;

So from that I'm deducing that the problem is in the native getSocketChannel method. I'm hoping to get some insights in the way it works. I tried to look at zetasql/local_service/local_service_jni.cc but didn't understand much. Maybe it can be made idempotent to avoid "multiple extension registrations"? Or maybe I'm doing it all wrong and should somehow reuse the connection?

I can provide more details and some code if that helps. I would appreciate any advice or information.

Does not build on bazel 1.0

I have tried to build on mac:

georgefraser@gfraser zetasql % bazel build ...
INFO: Writing tracer profile to '/private/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/command.profile.gz'
DEBUG: /private/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/io_grpc_grpc_java/java_grpc_library.bzl:184:9: Multi
ple values in 'deps' is deprecated in local_service_java_grpc
INFO: Call stack for the definition of repository 'io_grpc_grpc_netty' which is a jvm_import_external (rule definition at /private/var/tmp
/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/bazel_tools/tools/build_defs/repo/jvm.bzl:203:23):
 - /private/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/bazel_tools/tools/build_defs/repo/jvm.bzl:263:5
 - /Users/georgefraser/Documents/zetasql/zetasql_deps.bzl:402:9
 - /Users/georgefraser/Documents/zetasql/WORKSPACE:49:1
ERROR: An error occurred during the fetch of repository 'io_grpc_grpc_netty':
   java.io.IOException: No URLs left after removing plain http URLs due to missing checksum. Please provde either a checksum or an https d
ownload location.
INFO: Call stack for the definition of repository 'io_opencensus_opencensus_api' which is a jvm_import_external (rule definition at /priva
te/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/bazel_tools/tools/build_defs/repo/jvm.bzl:203:23):
 - /private/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/bazel_tools/tools/build_defs/repo/jvm.bzl:263:5
 - /Users/georgefraser/Documents/zetasql/zetasql_deps.bzl:455:9
 - /Users/georgefraser/Documents/zetasql/WORKSPACE:49:1
INFO: Call stack for the definition of repository 'io_opencensus_opencensus_contrib_grpc_metrics' which is a jvm_import_external (rule def
inition at /private/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/bazel_tools/tools/build_defs/repo/jvm.bzl:203:23
):
 - /private/var/tmp/_bazel_georgefraser/cabbf0dc045c08c79f73ceb9158c0915/external/bazel_tools/tools/build_defs/repo/jvm.bzl:263:5
 - /Users/georgefraser/Documents/zetasql/zetasql_deps.bzl:465:9
 - /Users/georgefraser/Documents/zetasql/WORKSPACE:49:1
ERROR: /Users/georgefraser/Documents/zetasql/java/com/google/zetasql/BUILD:266:1: //java/com/google/zetasql:client_jar depends on @io_grpc
_grpc_netty//jar:jar in repository @io_grpc_grpc_netty which failed to fetch. no such package '@io_grpc_grpc_netty//jar': java.io.IOExcept
ion: No URLs left after removing plain http URLs due to missing checksum. Please provde either a checksum or an https download location.
ERROR: Analysis of target '//java/com/google/zetasql:client_jar' failed; build aborted: no such package '@io_grpc_grpc_netty//jar': java.i
o.IOException: No URLs left after removing plain http URLs due to missing checksum. Please provde either a checksum or an https download l
ocation.
INFO: Elapsed time: 6.411s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (40 packages loaded, 30 targets configured)
    Fetching @native_utils; fetching
georgefraser@gfraser zetasql % bazel version
Build label: 1.1.0-homebrew
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Oct 21 21:09:45 2019 (1571692185)
Build timestamp: 1571692185
Build timestamp as int: 1571692185
georgefraser@gfraser zetasql %

and on linux:

george_w_fraser@cloudshell:~/zetasql (analog-delight-604)$ bazel build ...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Writing tracer profile to '/home/george_w_fraser/.cache/bazel/_bazel_george_w_fraser/dd7c07d963b307c31b23d63706115009/command.profile.gz'
DEBUG: /home/george_w_fraser/.cache/bazel/_bazel_george_w_fraser/dd7c07d963b307c31b23d63706115009/external/io_grpc_grpc_java/java_grpc_library.bzl:184:9: Multip
le values in 'deps' is deprecated in local_service_java_grpc
WARNING: Download from http://bazel-mirror.storage.googleapis.com/repo1.maven.org/maven2/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar failed: cl
ass com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
INFO: Call stack for the definition of repository 'io_grpc_grpc_core' which is a jvm_import_external (rule definition at /home/george_w_fraser/.cache/bazel/_baz
el_george_w_fraser/dd7c07d963b307c31b23d63706115009/external/bazel_tools/tools/build_defs/repo/jvm.bzl:203:23):
 - /home/george_w_fraser/.cache/bazel/_bazel_george_w_fraser/dd7c07d963b307c31b23d63706115009/external/bazel_tools/tools/build_defs/repo/jvm.bzl:263:5
 - /home/george_w_fraser/zetasql/zetasql_deps.bzl:390:9
 - /home/george_w_fraser/zetasql/WORKSPACE:49:1
ERROR: An error occurred during the fetch of repository 'io_grpc_grpc_core':
   java.io.IOException: No URLs left after removing plain http URLs due to missing checksum. Please provde either a checksum or an https download location.
ERROR: /home/george_w_fraser/zetasql/java/com/google/zetasql/BUILD:251:1: //java/com/google/zetasql:analyzer depends on @io_grpc_grpc_core//jar:jar in repositor
y @io_grpc_grpc_core which failed to fetch. no such package '@io_grpc_grpc_core//jar': java.io.IOException: No URLs left after removing plain http URLs due to m
issing checksum. Please provde either a checksum or an https download location.
ERROR: Analysis of target '//java/com/google/zetasql:analyzer' failed; build aborted: no such package '@io_grpc_grpc_core//jar': java.io.IOException: No URLs le
ft after removing plain http URLs due to missing checksum. Please provde either a checksum or an https download location.
INFO: Elapsed time: 93.929s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (77 packages loaded, 1397 targets configured)
george_w_fraser@cloudshell:~/zetasql (analog-delight-604)$ bazel version
Build label: 1.0.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Oct 10 10:16:08 2019 (1570702568)
Build timestamp: 1570702568
Build timestamp as int: 1570702568
george_w_fraser@cloudshell:~/zetasql (analog-delight-604)$

Too many levels of nesting

Hello!

I'm trying to run the static Java method Analyzer.analyzeStatement over a quite long select statement with multiple CTEs, and I'm getting the following error:

Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message had too many levels of nesting. May be malicious. Use CodedInputStream.setRecursionLimit() to increase the depth limit.
The catalog is properly built and everything works perfectly with smaller queries.

Is it possible to increase this recursion limit or a workaround to parse long SQL statements?

Thank you and Merry Christmas!

StructType.toString throws UnsupportedOperationException

Doing String.format or any other function that calls toString fails on a struct type.

java.lang.UnsupportedOperationException
	at com.google.zetasql.ZetaSQLStrings.toIdentifierLiteral(ZetaSQLStrings.java:39)
	at com.google.zetasql.StructType.debugString(StructType.java:177)
	at com.google.zetasql.Type.toString(Type.java:334)
	at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2886)
	at java.util.Formatter$FormatSpecifier.print(Formatter.java:2763)
	at java.util.Formatter.format(Formatter.java:2520)
	at java.util.Formatter.format(Formatter.java:2455)
	at java.lang.String.format(String.java:2942)

How to extract the full table identifier from a Struct column with the Java wrapper

Hello, I have the following BigQuery SQL statement which queries a single struct column:

select teststruct.col1 from project1.dataset1.table1

I am trying to parse out the full identifier (with table) of the column, i.e., project1.dataset1.table1.teststruct.col1 but thus far I have only succeeded parsing out table1.teststruct.col1.

The following will outline my approach, followed by a full example to reproduce the above:
First, I am adding catalogs for each of the table identifiers, followed by the struct column and the table:

// create struct type
SimpleType structFieldType = TypeFactory.createSimpleType(ZetaSQLMap.SIMPLE_TYPE_KIND.get("string"));
StructType testStruct = TypeFactory.createStructType(Collections.singletonList(new StructType.StructField("col1", structFieldType)));
SimpleColumn structCol = new SimpleColumn("project1.dataset1.table1", "teststruct", testStruct);

// create table catalog
SimpleCatalog catalog = new SimpleCatalog("tableCatalog");
SimpleCatalog catalogProject = catalog.addNewSimpleCatalog("project1");
SimpleCatalog catalogDataset = catalogProject.addNewSimpleCatalog("dataset1");
SimpleTable table = new SimpleTable("table1", Collections.singletonList(structCol));
catalogDataset.addSimpleTable(table);

Adding language options, initializing Analyzer:

catalog.addZetaSQLFunctions(new ZetaSQLBuiltinFunctionOptions());
LanguageOptions languageOptions = new LanguageOptions();
languageOptions.enableMaximumLanguageFeatures();
AnalyzerOptions options = new AnalyzerOptions();
options.setLanguageOptions(languageOptions);
Analyzer analyzer = new Analyzer(options, catalog);

Finally, retrieving the ResolvedStatment from the analyzer and implement a ResolvedNode Visitor for the ResolvedGetStructField, which:

Retrieves the name of the struct field, col1 in this case.
Resolves the column of the struct field, teststruct in this case, and retrieves the table name, table1 in this case. My issue is here: how do I retrieve the full identifier for the tablename, i.e. project1.dataset1.table1

Note: the below example is just a working example for this particular use case and will not work for multiple nested struct fields.

ResolvedNodes.ResolvedStatement statement = analyzer.analyzeStatement(sql);
ResolvedNodes.Visitor visitor = new ResolvedNodes.Visitor() {
	@Override
	public void visit(ResolvedNodes.ResolvedGetStructField structField) {

		// get struct field index so that we can look up the name of the field
		int structFieldIdx = (int) structField.getFieldIdx();
		ResolvedNodes.ResolvedColumnRef resolvedColumnRef = (ResolvedNodes.ResolvedColumnRef) structField.getExpr();
		// get list of fields in struct and extract name using struct field index
		ImmutableList<StructType.StructField> fieldList = ((StructType) resolvedColumnRef.getType()).getFieldList();
		String fieldName = fieldList.get(structFieldIdx).getName();

                // get struct column and table name
		ResolvedColumn resolvedColumn = resolvedColumnRef.getColumn();
		String structRoot = resolvedColumn.getName();
		String tableName = resolvedColumn.getTableName();
					
		// fullIdentifier will be 'table1.teststruct.col1', how to get full table identifier
		String fullIdentifier = String.join(".", Arrays.asList(tableName, structRoot, fieldName));
		System.out.println(fullIdentifier);

	}
};
statement.accept(visitor);

Anyone know how I can extract the full table identifier along with the column? Thank you.

Full example to reproduce:

String sql = "select teststruct.col1 from project1.dataset1.table1";

// create struct type
SimpleType structFieldType = TypeFactory.createSimpleType(ZetaSQLMap.SIMPLE_TYPE_KIND.get("string"));
StructType testStruct = TypeFactory.createStructType(Collections.singletonList(new StructType.StructField("col1", structFieldType)));
SimpleColumn structCol = new SimpleColumn("project1.dataset1.table1", "teststruct", testStruct);

// create table catalog
SimpleCatalog catalog = new SimpleCatalog("tableCatalog");
SimpleCatalog catalogProject = catalog.addNewSimpleCatalog("project1");
SimpleCatalog catalogDataset = catalogProject.addNewSimpleCatalog("dataset1");
SimpleTable table = new SimpleTable("table1", Collections.singletonList(structCol));
catalogDataset.addSimpleTable(table);

// add analyzer, language options
catalog.addZetaSQLFunctions(new ZetaSQLBuiltinFunctionOptions());
LanguageOptions languageOptions = new LanguageOptions();
languageOptions.enableMaximumLanguageFeatures();
AnalyzerOptions options = new AnalyzerOptions();
options.setLanguageOptions(languageOptions);
Analyzer analyzer = new Analyzer(options, catalog);

//List<List<String>> test = Analyzer.extractTableNamesFromStatement(sql);
ResolvedNodes.ResolvedStatement statement = analyzer.analyzeStatement(sql);
ResolvedNodes.Visitor visitor = new ResolvedNodes.Visitor() {
	@Override
	public void visit(ResolvedNodes.ResolvedGetStructField structField) {

		// get struct field index so that we can look up the name of the field
		int structFieldIdx = (int) structField.getFieldIdx();

		ResolvedNodes.ResolvedColumnRef resolvedColumnRef = (ResolvedNodes.ResolvedColumnRef) structField.getExpr();

		// get list of fields in struct and extract name using structfield project1.dataset1.table1
		ImmutableList<StructType.StructField> fieldList = ((StructType) resolvedColumnRef.getType()).getFieldList();
		String fieldName = fieldList.get(structFieldIdx).getName();

		ResolvedColumn resolvedColumn = resolvedColumnRef.getColumn();
		String structRoot = resolvedColumn.getName();
		String tableName = resolvedColumn.getTableName();
					
		// fullIdentifier will be 'table1.teststruct.col1' - how to extract table as 
		String fullIdentifier = String.join(".", Arrays.asList(tableName, structRoot, fieldName));
		System.out.println(fullIdentifier);

	}
};
statement.accept(visitor);

	if (path.size() > 1) {
	Catalog* catalog = nullptr;
	ZETASQL_RETURN_IF_ERROR(GetCatalog(name, &catalog, options));
	if (catalog == nullptr) {
	return FunctionNotFoundError(path);
	}
	const absl::Span<const std::string> path_suffix =
	path.subspan(1, path.size() - 1);
	return catalog->FindFunction(path_suffix, function, options);

google / zetasql Goto Github PK

zetasql's Introduction

ZetaSQL - Analyzer Framework for SQL

Status of Project and Roadmap

Flags

How to Build

How to add as a Dependency in bazel

With docker

Example Usage

Differential Privacy

Versions

License

Support Disclaimer

zetasql's People

Stargazers

Watchers

Forkers

zetasql's Issues

Description

The request for help

docker_build.sh

docker/Dockerfile

Recommend Projects

Recommend Topics

Recommend Org