shinichi-takii / ddlparse Goto Github PK

View Code? Open in Web Editor NEW

85.0 5.0 29.0 91 KB

DDL parase and Convert to BigQuery JSON schema and DDL statements

Home Page: https://pypi.org/project/ddlparse/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

python sql ddl-parser bigquery ddl-parse mysql maria postgresql redshift oracle

ddlparse's Introduction

DDL Parse

DDL parase and Convert to BigQuery JSON schema and DDL statements module, available in Python.

Features

DDL parse and get table schema information.
Currently, only the CREATE TABLE statement is supported.
Convert to BigQuery JSON schema and BigQuery DDL statements.
Supported databases are MySQL/MariaDB, PostgreSQL, Oracle, Redshift.

Requirement

Python >= 3.5
pyparsing

Installation

Install

pip install:

$ pip install ddlparse

command install:

$ python setup.py install

Update

pip update:

$ pip install ddlparse --upgrade

Usage

Example

import json

from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE My_Schema.Sample_Table (
  Id integer PRIMARY KEY COMMENT 'User ID',
  Name varchar(100) NOT NULL COMMENT 'User name',
  Total bigint NOT NULL,
  Avg decimal(5,1) NOT NULL,
  Point int(10) unsigned,
  Zerofill_Id integer unsigned zerofill NOT NULL,
  Created_At date, -- Oracle 'DATE' -> BigQuery 'DATETIME'
  UNIQUE (NAME)
);
"""


# parse pattern (1-1)
table = DdlParse().parse(sample_ddl)

# parse pattern (1-2) : Specify source database
table = DdlParse().parse(ddl=sample_ddl, source_database=DdlParse.DATABASE.oracle)


# parse pattern (2-1)
parser = DdlParse(sample_ddl)
table = parser.parse()

print("* BigQuery Fields * : normal")
print(table.to_bigquery_fields())


# parse pattern (2-2) : Specify source database
parser = DdlParse(ddl=sample_ddl, source_database=DdlParse.DATABASE.oracle)
table = parser.parse()


# parse pattern (3-1)
parser = DdlParse()
parser.ddl = sample_ddl
table = parser.parse()

# parse pattern (3-2) : Specify source database
parser = DdlParse()
parser.source_database = DdlParse.DATABASE.oracle
parser.ddl = sample_ddl
table = parser.parse()

print("* BigQuery Fields * : Oracle")
print(table.to_bigquery_fields())


print("* TABLE *")
print("schema = {} : name = {} : is_temp = {}".format(table.schema, table.name, table.is_temp))

print("* BigQuery Fields *")
print(table.to_bigquery_fields())

print("* BigQuery Fields - column name to lower case / upper case *")
print(table.to_bigquery_fields(DdlParse.NAME_CASE.lower))
print(table.to_bigquery_fields(DdlParse.NAME_CASE.upper))

print("* COLUMN *")
for col in table.columns.values():
    col_info = {}

    col_info["name"]                  = col.name
    col_info["data_type"]             = col.data_type
    col_info["length"]                = col.length
    col_info["precision(=length)"]    = col.precision
    col_info["scale"]                 = col.scale
    col_info["is_unsigned"]           = col.is_unsigned
    col_info["is_zerofill"]           = col.is_zerofill
    col_info["constraint"]            = col.constraint
    col_info["not_null"]              = col.not_null
    col_info["PK"]                    = col.primary_key
    col_info["unique"]                = col.unique
    col_info["auto_increment"]        = col.auto_increment
    col_info["distkey"]               = col.distkey
    col_info["sortkey"]               = col.sortkey
    col_info["encode"]                = col.encode
    col_info["default"]               = col.default
    col_info["character_set"]         = col.character_set
    col_info["bq_legacy_data_type"]   = col.bigquery_legacy_data_type
    col_info["bq_standard_data_type"] = col.bigquery_standard_data_type
    col_info["comment"]               = col.comment
    col_info["description(=comment)"] = col.description
    col_info["bigquery_field"]        = json.loads(col.to_bigquery_field())

    print(json.dumps(col_info, indent=2, ensure_ascii=False))

print("* DDL (CREATE TABLE) statements *")
print(table.to_bigquery_ddl())

print("* DDL (CREATE TABLE) statements - dataset name, table name and column name to lower case / upper case *")
print(table.to_bigquery_ddl(DdlParse.NAME_CASE.lower))
print(table.to_bigquery_ddl(DdlParse.NAME_CASE.upper))

print("* Get Column object (case insensitive) *")
print(table.columns["total"])
print(table.columns["total"].data_type)

License

BSD 3-Clause License

Author

Shinichi Takii [email protected]

Special Thanks

pyparsing : https://github.com/pyparsing/pyparsing

ddlparse's People

Contributors

Stargazers

Watchers

ddlparse's Issues

Add Support for RedShift column_attributes

RedShift DDL:

CREATE TABLE IF NOT EXISTS "sample_schema"."sample_table"
(
	"staff_flg" CHAR(1) SORTKEY
	,"leader_flg" float(1,1)  DISTKEY
)
;

Code:

from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE IF NOT EXISTS "sample_schema"."sample_table"
(
	
	"staff_flg" CHAR(1)  DEFAULT '0'::bpchar ENCODE lzo SORTKEY
	,"leader_flg" float(1,1)  DEFAULT '0'::bpchar ENCODE lzo DISTKEY
)
DISTSTYLE EVEN
;

"""

# parse pattern (1)
table = DdlParse().parse(sample_ddl)

# parse pattern (2)
parser = DdlParse()
parser.ddl = sample_ddl
table = parser.parse()

print("* TABLE *")
print("schema = {} : name = {} : is_temp = {}".format(table.schema, table.name, table.is_temp))

print("* BigQuery Fields *")
print(table.to_bigquery_fields())

print("* BigQuery Fields - column name to lower case / upper case *")
print(table.to_bigquery_fields(DdlParse.NAME_CASE.lower))
print(table.to_bigquery_fields(DdlParse.NAME_CASE.upper))

print("* COLUMN *")
for col in table.columns.values():
    print("name = {} : data_type = {} : length = {} : precision(=length) = {} : scale = {} : constraint = {} : not_null =  {} : PK =  {} : unique =  {} : BQ {}".format(
        col.name,
        col.data_type,
        col.length,
        col.precision,
        col.scale,
        col.constraint,
        col.not_null,
        col.primary_key,
        col.unique,
        col.to_bigquery_field()
        ))

Output:

* COLUMN *
name = staff_flg : data_type = CHAR : length = 1 : precision(=length) = 1 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : BQ {"name": "staff_flg", "type": "STRING", "mode": "NULLABLE"}

Its not moving to the next column after the SORTKEY keyword.

Unable to parse MySQL character sets

ddlparse appears to be unable to correctly parse the following DDL:

CREATE TABLE `foo` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `country` char(2) CHARACTER SET latin1 NOT NULL,
  PRIMARY KEY (`id`)
);

This DDL is parsed as:

* COLUMN *
{
  "name": "id",
  "data_type": "INT",
  "length": 10,
  "precision(=length)": 10,
  "scale": null,
  "is_unsigned": true,
  "is_zerofill": false,
  "constraint": "NOT NULL",
  "not_null": true,
  "PK": false,
  "unique": false,
  "auto_increment": true,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "country",
  "data_type": "CHAR",
  "length": 2,
  "precision(=length)": 2,
  "scale": null,
  "is_unsigned": false,
  "is_zerofill": false,
  "constraint": "",
  "not_null": false,
  "PK": false,
  "unique": false,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "CHARACTER",
  "data_type": "SET",
  "length": null,
  "precision(=length)": null,
  "scale": null,
  "is_unsigned": false,
  "is_zerofill": false,
  "constraint": "",
  "not_null": false,
  "PK": false,
  "unique": false,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "latin1",
  "data_type": "NOT",
  "length": null,
  "precision(=length)": null,
  "scale": null,
  "is_unsigned": false,
  "is_zerofill": false,
  "constraint": "",
  "not_null": false,
  "PK": false,
  "unique": false,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}

Note that the CHARACTER column as well as the latin1 column should not be there. I am running ddlparse v1.7.0

Add BigQuery Standard SQL data type property

Requirements

Add DdlParseColumn.bigquery_standard_data_type property.
- Get BigQuery Standard SQL data property.

Add BigQuery Legacy SQL data type property

Requirements

Add DdlParseColumn.bigquery_legacy_data_type property.
- Get BigQuery Legacy SQL data property.
- Alias of DdlParseColumn.bigquery_data_type property.

Not able to parse Default value and Foreign Key constraint

CREATE TABLE sparkpanel.test_table1 (
  Id integer PRIMARY KEY COMMENT 'User ID',
  id1 integer ,
  Name varchar(100) NOT NULL DEFAULT 'User name',
  Total bigint NOT NULL DEFAULT 1 ,
  INSERT_DATE datetime NOT NULL DEFAULT current_timestamp(),
  INSERT_DATE1 datetime DEFAULT '2017-02-03' ,
  Avg decimal(5,1) NOT NULL,
  Created_At date, 
  FOREIGN KEY (ID1) REFERENCES Persons(PersonID),
  UNIQUE (NAME), unique(AVG)
);

column with timestamp not null default current_timestamp on update current_timestamp parse 3 columns

create table t1(update_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP);
when parse this ddl, expect 1 column,but get 3columns.

Add supports column comments in BigQuery DDL

Requirements

Add supports column comments in BigQuery DDL.

Example

DDL example:

CREATE TABLE Sample_Table (
  Col_01 character varying(100) PRIMARY KEY COMMENT 'Single Quote',
  Col_02 text NOT NULL COMMENT "Double Quote",
  Col_03 integer COMMENT 'in "Quote"',
  Col_04 integer COMMENT "in 'Quote'",
  Col_05 integer COMMENT 'コメント is full-width(Japanese) character',
  Col_06 float -- COMMENT 'Comment out'
);

BigQuery DDL output:

#standardSQL
CREATE TABLE `project.dataset.Sample_Table`
(
  Col_01 STRING NOT NULL OPTIONS (description = "Single Quote"),
  Col_02 STRING NOT NULL OPTIONS (description = "Double Quote"),
  Col_03 INT64 OPTIONS (description = "in \"Quote\""),
  Col_04 INT64 OPTIONS (description = "in 'Quote'"),
  Col_05 INT64 OPTIONS (description = "コメント is full-width(Japanese) character"),
  Col_06 FLOAT64
)

Support for UUID in PostgreSQL

UUID should treat as STRING
Example field: id uuid NOT NULL

TIME WITHOUT TIME ZONE not supported

Hello,

TIME type can be without time zone, but this case is missing in bigquery_data_type() at ddlparse/ddlparse.py:299.

Adding this solves the issue:

        BQ_DATA_TYPE_DIC["TIME"] = {None: ["TIME", "TIME WITHOUT TIME ZONE"]}

I may do a PR when solving other issues.

Thank you!

Migrate CI/CD from Travis CI to GitHub Actions

Description

Travis CI builds have been very slow lately.
Migrate CI/CD from Travis CI to GitHub Actions.

Oracle `NUMBER(*,0)` format causes column parsing to stop

Very excited to find this -- just what I need, so thank you very much for creating it! I created a test to directly call Oracle 12.2.0.1 and run dbms_metadata.get_ddl, which is the standard method for producing DDL from the Oracle database using the CLI. It produces CREATE TABLE statements like this, which I then ran through ddlparse to convert to BigQuery -- but I found that most of the columns were missing in the resulting sql. Here's a simple runnable example where I removed the (*,0) from the first NUMBER column (COL2_WORKS):

from ddlparse.ddlparse import DdlParse


def main():

    oracle_ct = """
  CREATE TABLE "MYSCHEMA"."MYTABLE"
   (	"COL1" VARCHAR2(30) NOT NULL ENABLE,
	"COL2_WORKS" NUMBER NOT NULL ENABLE,
	"COL3" VARCHAR2(30) NOT NULL ENABLE,
	"COL4_BROKEN_AFTER_THIS" NUMBER(*,0) NOT NULL ENABLE,
	"COL5" VARCHAR2(8) NOT NULL ENABLE,
	"COL6" VARCHAR2(3) NOT NULL ENABLE
   ) ;
"""

    table = DdlParse().parse(ddl=oracle_ct, source_database=DdlParse.DATABASE.oracle)
    print(table.to_bigquery_ddl())


if __name__ == "__main__":
    main()

Note the NUMBER(*,0 designation. I found in testing in on many tables that as soon as ddlparse hits any NUMBER(*,0) data type column, it stops processing any more columns, so I get BigQuery DDL like this:

#standardSQL
CREATE TABLE `project.MYSCHEMA.MYTABLE`
(
  COL1 STRING NOT NULL,
  COL2_WORKS FLOAT64 NOT NULL,
  COL3 STRING NOT NULL,
  COL4_BROKEN_AFTER_THIS FLOAT64
)

I stepped through the code/breakpoints enough to understand that the issue is happening in the original column parsing, not in the conversion to BQ DDL. The BQ conversion is only called for the 4 columns.

I suspect the * is breaking a regex in ddlparse or pyparse. I'm out of time to figure out where, and I'd imagine you will be a lot faster at it since you are familiar with how you're using them, but I wanted to share this much since I got this far. For now, easily fixed by just preprocessing my Oracle CREATE TABLE strings to remove anything with the *.

Additional note: it is turning NUMBER(*,0) into a FLOAT64 when NUMBER(*,0) is an integer - but perhaps that's because of the broken regex as well / it's being cut off.

Thanks again for the code!

Parsing failure for columns include dot at default values

Requirements

Parsing failure for columns include dot at default values.

Example code

# -*- coding: utf-8 -*-

from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE Sample_Table (
  Col_01 varchar(100),
  Col_02 char(200),
  Col_03 integer,
  Col_04 double,
  Col_05 datetime,
  Col_06 decimal(2,1) not null default '0.0',
  Col_07 integer,
  CONSTRAINT const_01 PRIMARY KEY (Col_01, Col_02),
  CONSTRAINT \"const_02\" UNIQUE (Col_03, Col_04),
  CONSTRAINT \"const_03\" FOREIGN KEY (Col_04, \"Col_05\") REFERENCES ref_table_01 (\"Col_04\", Col_05)
);
"""
parser = DdlParse(sample_ddl)
table = parser.parse()


print("* BigQuery Fields * : normal")
print(table.to_bigquery_fields())

Output

Parsing stops at Col_06...

[{"name": "Col_01", "type": "STRING", "mode": "NULLABLE"},{"name": "Col_02", "type": "STRING", "mode": "NULLABLE"},{"name": "Col_03", "type": "INTEGER", "mode": "NULLABLE"},{"name": "Col_04", "type": "FLOAT", "mode": "NULLABLE"},{"name": "Col_05", "type": "DATETIME", "mode": "NULLABLE"},{"name": "Col_06", "type": "FLOAT", "mode": "REQUIRED"}]

Parse table-level COMMENT from CREATE TABLE?

I really appreciate ddlparse.
I don't see how to capture the table description from the table-level COMMENT phrase. See example below:

CREATE TABLE LOGICAL.PARTNER_AGENT
(
 Partner_Agency_Code varchar(50) NOT NULL COMMENT 'Identifying code for parter agency - firm/company selling or servicing a policy on behalf of an insured.',
 Partner_Agent_Code  varchar NOT NULL COMMENT 'Identifying code for an agent - individual selling or servicing a policy on behalf of an insured.',
 First_Name          varchar,
 Last_Name           varchar,
 Email_Address       varchar,
 CONSTRAINT Partner_Agent_PK PRIMARY KEY ( Partner_Agency_Code, Partner_Agent_Code ),
 CONSTRAINT 3392_REL FOREIGN KEY ( Partner_Agency_Code ) REFERENCES LOGICAL.PARTNER_AGENCY ( Partner_Agency_Code )
)
COMMENT = 'Agent is an individual selling or servicing a policy on behalf of an insured.';

DDL containing more than one CREATE statement

If the DDL string contains more than one create statements, only the first one is getting parsed and the parse method returns just one table all the time. Need support for ddl containing potentially multiple create statements and parsing should return list of tables

Support Oracle's `NUMBER` data type

Requirements

Support Oracle's NUMBER data type

Links

Oracle Document
- http://otndnld.oracle.co.jp/document/products/oracle11g/111/doc_dvd/server.111/E05765-03/datatype.htm#17223

Add supports to BigQuery `NUMERIC` data type

Requirements

Add supports to BigQuery NUMERIC data type.

Data-type Conditions

BigQuery Data Type	Source Data Type	Precision	Scale	Database	Exapmle Source Data Type
INT64	(NUMERIC\|NUMBER\|DECIMAL)	< 19	= 0	-	NUMERIC(18)
FLOAT64	(NUMERIC\|NUMBER\|DECIMAL)	< 19	> 0	-	NUMERIC(18, 1)
NUMERIC	(NUMERIC\|NUMBER\|DECIMAL)	>= 19	-	-	NUMERIC(19)
NUMERIC	(NUMERIC\|NUMBER\|DECIMAL)	*	-	-	NUMBER(*, 0)
INT64	(NUMERIC\|NUMBER\|DECIMAL)	None	-	default	DECIMAL
NUMERIC	(NUMERIC\|NUMBER\|DECIMAL)	None	-	`DdlParse.DATABASE.oracle` `DdlParse.DATABASE.postgresql`	NUMBER NUMERIC

Example

DDL example:

CREATE TABLE Sample_Table (
  Col_01_NUMERIC_18_0 NUMERIC(18) COMMENT 'BQ type: INT64',
  Col_02_NUMERIC_18_1 NUMERIC(18, 1) COMMENT 'BQ type: FLOAT64',
  Col_03_NUMERIC_19 NUMERIC(19) COMMENT 'BQ type: NUMERIC',
  Col_04_NUMERIC_ASTERISK NUMBER(*, 0) COMMENT 'BQ type: NUMERIC',
  Col_05_NUMERIC DECIMAL COMMENT 'BQ type: Ora/Postgres - NUMERIC / Other - INT64'
);

Related issues

#45 Oracle NUMBER(*,0) format causes column parsing to stop

Support for Column names with space

Sometimes we may have column names space that is not working.

Reproduce the issue:

from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE IF NOT EXISTS stage2.tbl
(
	id VARCHAR(36) NOT NULL  ENCODE RAW
		,"lead id" VARCHAR(50)   ENCODE RAW
	,PRIMARY KEY (id)
)
DISTSTYLE AUTO
;


"""

# parse pattern (1)
table = DdlParse().parse(sample_ddl)

# parse pattern (2)
parser = DdlParse()
parser.ddl = sample_ddl
table = parser.parse()

print("* TABLE *")
print("schema = {} : name = {} : is_temp = {}".format(table.schema, table.name, table.is_temp))

print("* BigQuery Fields *")
print(table.to_bigquery_fields())

print("* BigQuery Fields - column name to lower case / upper case *")
print(table.to_bigquery_fields(DdlParse.NAME_CASE.lower))
print(table.to_bigquery_fields(DdlParse.NAME_CASE.upper))

print("* COLUMN *")
for col in table.columns.values():
    print("name = {} : data_type = {} : length = {} : precision(=length) = {} : scale = {} : constraint = {} : not_null =  {} : PK =  {} : unique =  {} : BQ {}".format(
        col.name,
        col.data_type,
        col.length,
        col.precision,
        col.scale,
        col.constraint,
        col.not_null,
        col.primary_key,
        col.unique,
        col.to_bigquery_field()
        ))

Error:

Traceback (most recent call last):
  File "bqconverter.py", line 280, in <module>
    main()
  File "bqconverter.py", line 273, in main
    redshift_file_conversion(infile)
  File "bqconverter.py", line 151, in redshift_file_conversion
    target_type = data_mapping[source_type]
KeyError: 'id'

add to_bigquery_fields method to Columns dicttionary

Requirements

add to_bigquery_fields method to Columns dicttionary(DdlParseColumnDict class)

Don't fail on dot within schema

Hello there!
Thanks for a great lib, seems to work well.

One issue I've found, is if a schema has a dot in the schema (albeit annoyingly, but no control over naming), parsing will fail.

Minimal example:

from ddlparse import DdlParse

ddl = """ 
create table "foo.bar"."bazz" (
    id int
);
"""
DdlParse(ddl).parse()  # pyparsing.ParseException
DdlParse(ddl.replace('foo.bar', 'foo_bar')).parse()  # okay.

Exception:

pyparsing.ParseException: Expected {Suppress:({"--" Re:('.+')}) | {Suppress:("CREATE") ["TEMP"] Suppress:("TABLE") [Suppress:("IF NOT EXISTS")] [{Suppress:("") | Suppress:(""")}] [{W:(ABCD...) [{Suppress:("") | Suppress:(""")}] Suppress:(".") [{Suppress:("") | Suppress:(""")}]}] W:(ABCD...) [{Suppress:("") | Suppress:(""")}] Suppress:("(") {{{{{Suppress:({"--" Re:('.+')}) | Suppress:({"KEY" W:(ABCD...)})} | Group:({[{{{Suppress:("CONSTRAINT") [{Suppress:("") | Suppress:(""")}]} W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}] {{{{{{{{{{"PRIMARY KEY" ^ "UNIQUE"} ^ "UNIQUE KEY"} ^ "NOT NULL"} [{Suppress:("") | Suppress:(""")}]} [W:(ABCD...)]} [{Suppress:("") | Suppress:(""")}]} Suppress:("(")} Group:({{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} [, {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}]...)} Suppress:(")")} | {{{{"FOREIGN KEY" Suppress:("(")} Group:({{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} [, {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}]...)} Suppress:(")")} [{{{{{{{{{Suppress:("REFERENCES") [{Suppress:("") | Suppress:(""")}]} W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} Suppress:("(")} Group:({{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} [, {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}]...)} Suppress:(")")} [{"MATCH" W:(ABCD...)}]} [{{"ON" "DELETE"} {{{"RESTRICT" | "CASCADE"} | "SET NULL"} | "NO ACTION"}}]} [{{"ON" "UPDATE"} {{{"RESTRICT" | "CASCADE"} | "SET NULL"} | "NO ACTION"}}]}]}}})} | Group:({{{{{{{Suppress:("") | Suppress:(""")} W:(ABCD...)} {Suppress:("") | Suppress:(""")}} ^ {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}} Group:({{{Group:({W:(ABCD...) [{{{"WITHOUT TIME ZONE" ^ "WITH TIME ZONE"} ^ "PRECISION"} ^ "VARYING"}]}) [{{{Suppress:("(") Re:('[\d\]+\s,\s\d*')} [{"CHAR" | "BYTE"}]} Suppress:(")")}]} ["UNSIGNED"]} ["ZEROFILL"]})} [W:([])]} [{Re:('(?!--)') Group:({{{{{{{{[Re:('\b(?:NOT\s+)?NULL?\b')] & [Re:('\bAUTO_INCREMENT\b')]} & [Re:('\b(UNIQUE|PRIMARY)(?:\s+KEY)?\b')]} & [Re:('\bDEFAULT\b\s+(?:((?:[A-Za-z0-9_\.\'\" -\{\}]|[^\\x01-\\x7E])\:\:(?:character varying)?[A-Za-z0-9\[\]]+)|(?:\')((?:\\\'|[^\\\']|,)+)(?:\')|(?:\")((?:\\\"|[^\\"]|,)+)(?:\")|([^,\\s]+))')]} & [Re:('\bCOMMENT\b\s+(\'(\\\'|[^\\\']|,)+\'|\"(\\\"|[^\\"]|,)+\"|[^,\\s]+)')]} & [Re:('\bENCODE\s+[A-Za-z0-9]+\b')]} & ["DISTKEY"]} & ["SORTKEY"]} & [{Suppress:("CHARACTER SET") W:(ABCD...)}]})}]})} | Suppress:({"--" Re:('.+')})}}... [, {{{{{Suppress:({"--" Re:('.+')}) | Suppress:({"KEY" W:(ABCD...)})} | Group:({[{{{Suppress:("CONSTRAINT") [{Suppress:("") | Suppress:(""")}]} W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}] {{{{{{{{{{"PRIMARY KEY" ^ "UNIQUE"} ^ "UNIQUE KEY"} ^ "NOT NULL"} [{Suppress:("") | Suppress:(""")}]} [W:(ABCD...)]} [{Suppress:("") | Suppress:(""")}]} Suppress:("(")} Group:({{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} [, {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}]...)} Suppress:(")")} | {{{{"FOREIGN KEY" Suppress:("(")} Group:({{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} [, {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}]...)} Suppress:(")")} [{{{{{{{{{Suppress:("REFERENCES") [{Suppress:("") | Suppress:(""")}]} W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} Suppress:("(")} Group:({{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]} [, {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}]...)} Suppress:(")")} [{"MATCH" W:(ABCD...)}]} [{{"ON" "DELETE"} {{{"RESTRICT" | "CASCADE"} | "SET NULL"} | "NO ACTION"}}]} [{{"ON" "UPDATE"} {{{"RESTRICT" | "CASCADE"} | "SET NULL"} | "NO ACTION"}}]}]}}})} | Group:({{{{{{{Suppress:("") | Suppress:(""")} W:(ABCD...)} {Suppress:("") | Suppress:(""")}} ^ {{[{Suppress:("") | Suppress:(""")}] W:(ABCD...)} [{Suppress:("") | Suppress:(""")}]}} Group:({{{Group:({W:(ABCD...) [{{{"WITHOUT TIME ZONE" ^ "WITH TIME ZONE"} ^ "PRECISION"} ^ "VARYING"}]}) [{{{Suppress:("(") Re:('[\d\]+\s*,\s\d*')} [{"CHAR" | "BYTE"}]} Suppress:(")")}]} ["UNSIGNED"]} ["ZEROFILL"]})} [W:([])]} [{Re:('(?!--)') Group:({{{{{{{{[Re:('\b(?:NOT\s+)?NULL?\b')] & [Re:('\bAUTO_INCREMENT\b')]} & [Re:('\b(UNIQUE|PRIMARY)(?:\s+KEY)?\b')]} & [Re:('\bDEFAULT\b\s+(?:((?:[A-Za-z0-9_\.\'\" -\{\}]|[^\\x01-\\x7E])*\:\:(?:character varying)?[A-Za-z0-9\[\]]+)|(?:\')((?:\\\'|[^\\\']|,)+)(?:\')|(?:\")((?:\\\"|[^\\"]|,)+)(?:\")|([^,\\s]+))')]} & [Re:('\bCOMMENT\b\s+(\'(\\\'|[^\\\']|,)+\'|\"(\\\"|[^\\"]|,)+\"|[^,\\s]+)')]} & [Re:('\bENCODE\s+[A-Za-z0-9]+\b')]} & ["DISTKEY"]} & ["SORTKEY"]} & [{Suppress:("CHARACTER SET") W:(ABCD...)}]})}]})} | Suppress:({"--" Re:('.+')})}}...]...}}, found '.' (at char 22), (line:1, col:23)

Support for Sort Key

Im not sure where to mention this. It may be a suggestion or a new feature.

Im RedShift we have SORT KEYs which is equivalent to clustering in BigQuery.

It would be great that the DDLparser detect it.

create table test
(
id int sortkey,
name varchar(10)
);

create table test2 
(
id int,
name varchar(10)
)
sortkey(id);

DDLparse:

print(col.sortkey )

id

OracleのNCLOBに対応していない

NCLOB 型のカラムを変換しようとすると Unknown data type : 'NCLOB' となります。
参考
- https://blogs.oracle.com/oracle4engineer/loblobblobclobsecurefiles

＜内部LOB （ Oracle Database内部に格納するLOB ）＞
BLOB ：バイナリデータを格納
CLOB ：文字データを格納
NCLOB ：各国語キャラクタ・セットの文字データを格納
＜外部LOB （ Oracle Database 表領域の外にあるOSファイルに格納）＞
BFILE ：読み取り専用（アプリケーション内で書込みできない）

Python3.7 : FutureWarning: Possible nested set at position 1

Description

Python3.7 FutureWarning message:

.tox/py37/lib/python3.7/site-packages/pyparsing.py:2910
  ddlparse/.tox/py37/lib/python3.7/site-packages/pyparsing.py:2910: FutureWarning: Possible nested set at position 1
    self.re = re.compile( self.reString )

Reference

Python3.7 Documents > re — Regular expression operations
https://docs.python.org/3.7/library/re.html

Support of nested sets and set operations as in Unicode Technical Standard #18 might be added in the future.
This would change the syntax, so to facilitate this change a FutureWarning will be raised in ambiguous cases for the time being.
That includes sets starting with a literal '[' or containing literal character sequences '--', '&&', '~~', and '||'.
To avoid a warning escape them with a backslash.

Parse error for MySQL DDL with 'FOREIGN KEY'

Example code

# -*- coding: utf-8 -*-

from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE `my_table` (
  `col_01` bigint(20) NOT NULL AUTO_INCREMENT,
  `col_02` varchar(255) DEFAULT NULL,
  `col_03` varchar(255) DEFAULT NULL,
  `col_04` text,
  `col_05` date DEFAULT NULL,
  `col_06` date DEFAULT NULL,
  `col_07` datetime NOT NULL,
  `col_08` datetime NOT NULL,
  `col_09` bigint(20) NOT NULL,
  `col_10` bigint(20) DEFAULT NULL,
  `col_11` bigint(20) DEFAULT NULL,
  `col_12` bigint(20) DEFAULT NULL,
  `col_13` tinyint(1) NOT NULL DEFAULT '0',
  `col_14` int(11) DEFAULT NULL,
  `col_15` varchar(255) DEFAULT NULL,
  `col_16` varchar(255) DEFAULT NULL,
  `col_17` text,
  `col_18` varchar(255) NOT NULL,
  `col_19` int(11) NOT NULL DEFAULT '100',
  PRIMARY KEY (`col_01`),
  UNIQUE KEY `index_cosmes_on_uuid` (`col_18`),
  KEY `index_on_col_09` (`col_09`),
  KEY `index_on_col_10` (`col_10`),
  KEY `index_on_col_11` (`col_11`),
  KEY `index_on_col_12` (`col_12`),
  CONSTRAINT `fk_rails_col_12` FOREIGN KEY (`col_12`) REFERENCES `tints` (`col_01`),
  CONSTRAINT `fk_rails_col_10` FOREIGN KEY (`col_10`) REFERENCES `tints` (`col_01`),
  CONSTRAINT `fk_rails_col_11` FOREIGN KEY (`col_11`) REFERENCES `tints` (`col_01`),
  CONSTRAINT `fk_rails_col_09` FOREIGN KEY (`col_09`) REFERENCES `series` (`col_01`)
) ENGINE=InnoDB AUTO_INCREMENT=74355 DEFAULT CHARSET=utf8mb4
"""


# parse
table = DdlParse().parse(sample_ddl)

print("* COLUMN *")
for col in table.columns.values():
    col_info = []
    col_info.append("name = {}".format(col.name))
    col_info.append("data_type = {}".format(col.data_type))
    col_info.append("length = {}".format(col.length))
    col_info.append("precision(=length) = {}".format(col.precision))
    col_info.append("scale = {}".format(col.scale))
    col_info.append("constraint = {}".format(col.constraint))
    col_info.append("not_null =  {}".format(col.not_null))
    col_info.append("PK =  {}".format(col.primary_key))
    col_info.append("unique =  {}".format(col.unique))
    col_info.append("bq_legacy_data_type =  {}".format(col.bigquery_legacy_data_type))
    col_info.append("bq_standard_data_type =  {}".format(col.bigquery_standard_data_type))
    col_info.append("BQ {}".format(col.to_bigquery_field()))
    print(" : ".join(col_info))

Output

* COLUMN *
name = col_01 : data_type = BIGINT : length = 20 : precision(=length) = 20 : scale = None : constraint = PRIMARY KEY : not_null =  True : PK =  True : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_01", "type": "INTEGER", "mode": "REQUIRED"}
name = col_02 : data_type = VARCHAR : length = 255 : precision(=length) = 255 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_02", "type": "STRING", "mode": "NULLABLE"}
name = col_03 : data_type = VARCHAR : length = 255 : precision(=length) = 255 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_03", "type": "STRING", "mode": "NULLABLE"}
name = col_04 : data_type = TEXT : length = None : precision(=length) = None : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_04", "type": "STRING", "mode": "NULLABLE"}
name = col_05 : data_type = DATE : length = None : precision(=length) = None : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  DATE : bq_standard_data_type =  DATE : BQ {"name": "col_05", "type": "DATE", "mode": "NULLABLE"}
name = col_06 : data_type = DATE : length = None : precision(=length) = None : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  DATE : bq_standard_data_type =  DATE : BQ {"name": "col_06", "type": "DATE", "mode": "NULLABLE"}
name = col_07 : data_type = DATETIME : length = None : precision(=length) = None : scale = None : constraint = NOT NULL : not_null =  True : PK =  False : unique =  False : bq_legacy_data_type =  DATETIME : bq_standard_data_type =  DATETIME : BQ {"name": "col_07", "type": "DATETIME", "mode": "REQUIRED"}
name = col_08 : data_type = DATETIME : length = None : precision(=length) = None : scale = None : constraint = NOT NULL : not_null =  True : PK =  False : unique =  False : bq_legacy_data_type =  DATETIME : bq_standard_data_type =  DATETIME : BQ {"name": "col_08", "type": "DATETIME", "mode": "REQUIRED"}
name = col_09 : data_type = BIGINT : length = 20 : precision(=length) = 20 : scale = None : constraint = NOT NULL : not_null =  True : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_09", "type": "INTEGER", "mode": "REQUIRED"}
name = col_10 : data_type = BIGINT : length = 20 : precision(=length) = 20 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_10", "type": "INTEGER", "mode": "NULLABLE"}
name = col_11 : data_type = BIGINT : length = 20 : precision(=length) = 20 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_11", "type": "INTEGER", "mode": "NULLABLE"}
name = col_12 : data_type = BIGINT : length = 20 : precision(=length) = 20 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_12", "type": "INTEGER", "mode": "NULLABLE"}
name = col_13 : data_type = TINYINT : length = 1 : precision(=length) = 1 : scale = None : constraint = NOT NULL : not_null =  True : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_13", "type": "INTEGER", "mode": "REQUIRED"}
name = col_14 : data_type = INT : length = 11 : precision(=length) = 11 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_14", "type": "INTEGER", "mode": "NULLABLE"}
name = col_15 : data_type = VARCHAR : length = 255 : precision(=length) = 255 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_15", "type": "STRING", "mode": "NULLABLE"}
name = col_16 : data_type = VARCHAR : length = 255 : precision(=length) = 255 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_16", "type": "STRING", "mode": "NULLABLE"}
name = col_17 : data_type = TEXT : length = None : precision(=length) = None : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_17", "type": "STRING", "mode": "NULLABLE"}
name = col_18 : data_type = VARCHAR : length = 255 : precision(=length) = 255 : scale = None : constraint = NOT NULL UNIQUE : not_null =  True : PK =  False : unique =  True : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "col_18", "type": "STRING", "mode": "REQUIRED"}
name = col_19 : data_type = INT : length = 11 : precision(=length) = 11 : scale = None : constraint = NOT NULL : not_null =  True : PK =  False : unique =  False : bq_legacy_data_type =  INTEGER : bq_standard_data_type =  INT64 : BQ {"name": "col_19", "type": "INTEGER", "mode": "REQUIRED"}
Traceback (most recent call last):
  File "/Users/uu106677/Documents/git_rls/private/ddlparse/example/example_issue.py", line 60, in <module>
    col_info.append("bq_legacy_data_type =  {}".format(col.bigquery_legacy_data_type))
  File "/Users/uu106677/.pyenv/versions/3.6.5/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 222, in bigquery_legacy_data_type
    return self.bigquery_data_type
  File "/Users/uu106677/.pyenv/versions/3.6.5/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 216, in bigquery_data_type
    raise ValueError("Unknown data type : '{}'".format(self._data_type))
ValueError: Unknown data type : 'FK_RAILS_COL_12'

Error while parsing

Create statement:

CREATE TABLE `CheckoutCartItem`
(
    `checkoutCartItemId`        int(10) unsigned        NOT NULL AUTO_INCREMENT,
    `checkoutCartId`            int(10) unsigned        NOT NULL,
    `timeCreated`               timestamp               NOT NULL COMMENT 'time when the item was added to the cart for the first time',
    `itemId`                    int(10) unsigned        NOT NULL,
    `skuid`                     varchar(64)                      DEFAULT NULL COMMENT 'additional identifier for item (e.g. skuid that defines required size of shoe)',
    `quantity`                  int(10) unsigned        NOT NULL COMMENT 'quantity of item in cart',
    `availableQuantity`         int(10) unsigned        NOT NULL COMMENT 'available quantity when user last seen the product - by this is detected change and user is notified about it',
    `discount`                  decimal(12, 3) unsigned NOT NULL DEFAULT 0.000 COMMENT 'applied discount to the product totalAmount',
    `tax`                       tinyint(2) unsigned     NOT NULL DEFAULT 0 COMMENT 'applied VAT to the amount',
    `amountPerItem`             decimal(13, 4)          NOT NULL COMMENT 'amount per item with VAT (by this is detected change and user is notified about it)',
    `amountPerItemExcludingVat` decimal(13, 4)          NOT NULL COMMENT 'amount per item without VAT',
    `totalAmount`               decimal(13, 4)          NOT NULL COMMENT 'amount * quantity +/- special tax, discount, ... with VAT (by this is detected change and user is notified about it)',
    `totalAmountExcludingVat`   decimal(13, 4)          NOT NULL COMMENT 'amount * quantity +/- special tax, discount, ... without VAT',
    PRIMARY KEY (`checkoutCartItemId`),
    UNIQUE KEY `checkoutCartId_itemId` (`checkoutCartId`, `itemId`),
    CONSTRAINT `checkoutCartItem_ibfk_1` FOREIGN KEY (`checkoutCartId`) REFERENCES `CheckoutCart` (`checkoutCartId`)
) ENGINE = InnoDB
  DEFAULT CHARSET = utf8;

code:

import glob
import os
from ddlparse.ddlparse import DdlParse

for filename in glob.glob('/home/jirka/Projekty/Glami/klarka/schema/mysql/tables/CheckoutCartItem.sql'): # TODO all
    with open(filename, 'r') as f: # open in readonly mode
        table = DdlParse().parse(f.read(), source_database=DdlParse.DATABASE.mysql)
        for col in table.columns.values():
            print(col.name)

output columns:

checkoutCartItemId
unsigned
NULL
checkoutCartId

Does not support PostgreSQL array and json

The parsing silently fails, if any of those two field types is present in the PostgreSQL schema:

character varying[]
json

The parser stops when it encounters a NULL column definition

The below SQL produces only one column as output.

Input:

CREATE TABLE sales (
	sale_id int4 NULL,
	posted_time timestamp,
	sale_hdr_title varchar,
	sale_inserted_at timestamp,
	sale_updated_at timestamp,
	sale_title varchar,
	sale_name varchar,
	sale_description text,
	sale_status varchar,
);

Output:

#standardSQL
CREATE TABLE `project.dataset.sales`
(
  sale_id INT64
)

ddlparse throws an error when primary key is composite

It seems to me that with the use of 'delimitedList', the primary key should allow multiple columns.

...Optional(Suppress(_CONSTRAINT) + Optional(_SUPPRESS_QUOTE) + Word(alphanums + "_")("name") + Optional(
                    _SUPPRESS_QUOTE))
                + (
                        (
                                (_PRIMARY_KEY ^ _UNIQUE ^ _UNIQUE_KEY ^ _NOT_NULL)("type")
                                + Optional(_SUPPRESS_QUOTE) + Optional(Word(alphanums + "_"))("name") + Optional(
                            _SUPPRESS_QUOTE)
                                + _LPAR + Group(delimitedList(
                            Optional(_SUPPRESS_QUOTE) + Word(alphanums + "_") + Optional(_SUPPRESS_QUOTE)))(
                            "constraint_columns") + _RPAR
                        )...

However, when provided with a line such as: primary key (column1, column2, column3), pyparsing raises the following exception:

raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected ")" (at char 325), (line:9, col:38)

When provided a table with a primary key constraint on only one column (primary key (column1)), ddlparse handles this with no problem.

Add support constraint name with quotes

Problem

DDLの CONSTRAINT 句で指定する制約名が、ダブルクォート（ " ）で囲われているとエラーになる。

patern A: OK

code

$ cat test_none_quote.py
from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE "SAMPLE_B"
(    "COL1" CHAR(10),
     "COL2" VARCHAR2(200),
      CONSTRAINT SAMPLE_B_PK PRIMARY KEY ("COL1")
)
"""

table = DdlParse().parse(sample_ddl)
parser = DdlParse(sample_ddl)
table = parser.parse()

print(table.to_bigquery_fields())

result

$ python test_none_quote.py
[{"name": "COL1", "type": "STRING", "mode": "REQUIRED"},{"name": "COL2", "type": "STRING", "mode": "NULLABLE"}]

patern B: NG

code

$ cat test_quote.py
from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE "SAMPLE_A"
(    "COL1" CHAR(10),
     "COL2" VARCHAR2(200),
      CONSTRAINT "SAMPLE_A_PK" PRIMARY KEY ("COL1")
)
"""

table = DdlParse().parse(sample_ddl)
parser = DdlParse(sample_ddl)
table = parser.parse()

print(table.to_bigquery_fields())

result

$ python test_quote.py
Traceback (most recent call last):
  File "test_quote.py", line 15, in <module>
    print(table.to_bigquery_fields())
  File "/Users/anonymous/anaconda3/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 305, in to_bigquery_fields
    return self._columns.to_bigquery_fields(name_case)
  File "/Users/anonymous/anaconda3/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 253, in to_bigquery_fields
    bq_fields.append(col.to_bigquery_field(name_case))
  File "/Users/anonymous/anaconda3/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 217, in to_bigquery_field
    return '{{"name": "{}", "type": "{}", "mode": "{}"}}'.format(self._get_name(name_case), self.bigquery_data_type, self.bigquery_mode)
  File "/Users/anonymous/anaconda3/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 206, in bigquery_data_type
    raise ValueError("Unknown data type : '{}'".format(self._data_type))
ValueError: Unknown data type : 'SAMPLE_A_PK'

Environment

$ pip freeze | grep ddlparse
ddlparse==1.1.2

Add support for Cloud Spanner DDLs

Several Spanner data types are currently not supported: https://cloud.google.com/spanner/docs/data-types

In particular, STRING (see #62), BYTES (see #64) and ARRAY.

ARRAY seems to be more difficult to add, see https://cloud.google.com/spanner/docs/data-types#array_type

ARRAY types are declared using the angle brackets (< and >).

I managed to solve the array issue by using Optional(Regex(r"\<(.*?)\>"))("array_brackets") within _CREATE_TABLE_STATEMENT

Additionally, Spanner DDLs can contain something like STRING(MAX), MAX needs to be supported in addition to numerical lengths.

Todos:
Add spanner to DdlParse.DATABASE options.
_CREATE_TABLE_STATEMENT needs to be adapted for Spanner.
For arrays, Spanner uses something like ARRAY<BOOL>, but we also need to account for lengths indicated as string, for example BYTES(MAX))

I've tested with this DDL:

CREATE TABLE ManuelsTable (
  col1 INT64,
  col2 STRING(MAX),
  col3 TIMESTAMP,
  col4 DATE,
  col5 BYTES(MAX),
  col6 ARRAY<BOOL>,
  col7 BOOL,
  col8 FLOAT64,
  col9 NUMERIC,
) PRIMARY KEY(col1)

Parsing stops without error on default varying varchar

Hello,

the field "imputation" below has a default value definition that stops DdlParse to go farther.
If you remove this default value definition (comment provided below), parsing is going on fine with the lastest columns.

This DDL is coming from a PG dump:

    --
    -- PostgreSQL database dump
    --
    
    -- Dumped from database version 9.6.22
    -- Dumped by pg_dump version 9.6.22

Example provided:

test_ddl = """
CREATE TABLE public.t_ref_structure (
id uuid NOT NULL,
"version" numeric NULL DEFAULT 0,
id_number int4 NULL,
created_date timestamptz NULL DEFAULT now(),
created_by varchar(240) NULL,
updated_date timestamptz NULL,
updated_by varchar(240) NULL,
check_cancel bool NOT NULL DEFAULT false,
cancel_date timestamptz NULL,
cancel_by varchar(240) NULL,
imputation varchar(10) NULL DEFAULT 'A'::character varying,
-- imputation varchar(10) NULL DEFAULT 'A',
ordre numeric NULL,
categorisation_sap varchar(240) NULL,
is_notifiable bool NULL DEFAULT false,
CONSTRAINT t_ref_structure_pkey PRIMARY KEY (id),
CONSTRAINT fk_str_imputation FOREIGN KEY (imputation) REFERENCES public.t_ref_imputation(code_imputation)
);
"""

parser = DdlParse()
parser.source_database = DdlParse.DATABASE.postgresql
parser.ddl = test_ddl
table = parser.parse()
table.columns

character varying seems to be handled by regex at ddlparse/ddlparse.py: 202 but I did not yet find the issue.

I'll submit a PR if I find the issue, but not yet.
Thank you

Unable to parse MySQL foreign key constraints (ON DELETE, CASCADE)

ddlparse appears to be unable to correctly parse the following DDL:

CREATE TABLE `foo`
(
    `id`            INT(10) UNSIGNED NOT NULL,
    `some_other_id` INT(10) UNSIGNED NOT NULL,
    UNIQUE KEY `foo` (`id`, `some_other_id`),
    CONSTRAINT `fk_1` FOREIGN KEY (`id`)
        REFERENCES `bar` (`id`)
        ON DELETE CASCADE
        ON UPDATE NO ACTION
) ENGINE = InnoDB
  DEFAULT CHARSET = latin1;

When I replace the example DDL with the one above, the output is:

Traceback (most recent call last):
  File "ddl.py", line 27, in <module>
    print(table.to_bigquery_fields())
  File "/Users/byarmis/venvs/bootstrapper/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 506, in to_bigquery_fields
    return self._columns.to_bigquery_fields(name_case)
  File "/Users/byarmis/venvs/bootstrapper/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 444, in to_bigquery_fields
    bq_fields.append(col.to_bigquery_field(name_case))
  File "/Users/byarmis/venvs/bootstrapper/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 371, in to_bigquery_field
    type = self.bigquery_legacy_data_type
  File "/Users/byarmis/venvs/bootstrapper/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 335, in bigquery_legacy_data_type
    return self.bigquery_data_type
  File "/Users/byarmis/venvs/bootstrapper/lib/python3.6/site-packages/ddlparse/ddlparse.py", line 329, in bigquery_data_type
    raise ValueError("Unknown data type : '{}'".format(self._data_type))
ValueError: Unknown data type : 'DELETE'

Skipping past the BigQuery aspects (commenting out everything from # parse pattern (1-2) through to print("* COLUMN *") as well as anything mentioning BigQuery in the example body), the parsed columns are as follows:

* COLUMN *
{
  "name": "id",
  "data_type": "INT",
  "length": 10,
  "precision(=length)": 10,
  "scale": null,
  "is_unsigned": true,
  "is_zerofill": false,
  "constraint": "NOT NULL UNIQUE",
  "not_null": true,
  "PK": false,
  "unique": true,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "some_other_id",
  "data_type": "INT",
  "length": 10,
  "precision(=length)": 10,
  "scale": null,
  "is_unsigned": true,
  "is_zerofill": false,
  "constraint": "NOT NULL UNIQUE",
  "not_null": true,
  "PK": false,
  "unique": true,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "ON",
  "data_type": "DELETE",
  "length": null,
  "precision(=length)": null,
  "scale": null,
  "is_unsigned": false,
  "is_zerofill": false,
  "constraint": "",
  "not_null": false,
  "PK": false,
  "unique": false,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "CASCADE",
  "data_type": "ON",
  "length": null,
  "precision(=length)": null,
  "scale": null,
  "is_unsigned": false,
  "is_zerofill": false,
  "constraint": "",
  "not_null": false,
  "PK": false,
  "unique": false,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}
{
  "name": "UPDATE",
  "data_type": "NO",
  "length": null,
  "precision(=length)": null,
  "scale": null,
  "is_unsigned": false,
  "is_zerofill": false,
  "constraint": "",
  "not_null": false,
  "PK": false,
  "unique": false,
  "auto_increment": false,
  "distkey": false,
  "sortkey": false,
  "encode": null,
  "default": null,
  "comment": null,
  "description(=comment)": null
}

This is a valid MySQL DDL in both 5.7 (what I'm using) as well as 8.0 (the current version). I am running ddlparse v1.7.0

Not completely parsed with block comments

Example

Only Col_ 01 was parsed successfully.
Fields after Col_ 02 are not output.

Example code

from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE Sample_Table (
  Col_01 varchar(100),
  -- comment
  -- comment
  Col_02 char(200),
  Col_03 text
);
"""


table = DdlParse().parse(sample_ddl)

print("* COLUMN *")
for col in table.columns.values():
    col_info = []
    col_info.append("name = {}".format(col.name))
    col_info.append("data_type = {}".format(col.data_type))
    col_info.append("length = {}".format(col.length))
    col_info.append("precision(=length) = {}".format(col.precision))
    col_info.append("scale = {}".format(col.scale))
    col_info.append("constraint = {}".format(col.constraint))
    col_info.append("not_null =  {}".format(col.not_null))
    col_info.append("PK =  {}".format(col.primary_key))
    col_info.append("unique =  {}".format(col.unique))
    col_info.append("bq_legacy_data_type =  {}".format(col.bigquery_legacy_data_type))
    col_info.append("bq_standard_data_type =  {}".format(col.bigquery_standard_data_type))
    col_info.append("BQ {}".format(col.to_bigquery_field()))
    print(" : ".join(col_info))

Output

* COLUMN *
name = Col_01 : data_type = VARCHAR : length = 100 : precision(=length) = 100 : scale = None : constraint =  : not_null =  False : PK =  False : unique =  False : bq_legacy_data_type =  STRING : bq_standard_data_type =  STRING : BQ {"name": "Col_01", "type": "STRING", "mode": "NULLABLE"}

Support Oracle's `TIMESTAMP(fractional_seconds_precision)` data type

Requirements

Support Oracle's TIMESTAMP(fractional_seconds_precision) data type.
- The valid range of fractional_seconds_precision is 0 to 9.
- BigQuery Supported seconds_precision is 0 to 6 both Legacy and Standard.

Links

Oracle Document
- http://otndnld.oracle.co.jp/document/products/oracle11g/111/doc_dvd/server.111/E05750-03/sql_elements.htm#26587
BigQuery Document
- Legacy: https://cloud.google.com/bigquery/data-types?hl=ja
- Standard: https://cloud.google.com/bigquery/sql-reference/data-types?hl=ja

Unable to convert DDL for Oracle table with LONG datatype

Traceback (most recent call last):
File "test2.py", line 65, in
col_info["bq_legacy_data_type"] = col.bigquery_legacy_data_type
File "/Users/x/Documents/ddlparse/env/lib/python3.8/site-packages/ddlparse/ddlparse.py", line 342, in bigquery_legacy_data_type
return self.bigquery_data_type
File "/Users/x/Documents/ddlparse/env/lib/python3.8/site-packages/ddlparse/ddlparse.py", line 336, in bigquery_data_type
raise ValueError("Unknown data type : '{}'".format(self._data_type))
ValueError: Unknown data type : 'LONG'

@shinichi-takii

Redshift [ IDENTITY ( seed, step ) ] statement support

Currently the Redshift [ IDENTITY ( seed, step ) ] is not supported, is it going to be implemented?

Add supports the parse of COMMENT statements

Requirements

Add supports the parse of COMMENT statements.
Add DdlParseColumn.comment property.
Add DdlParseColumn.description property for the DdlParseColumn.comment property alias.

Example

DDL example:

CREATE TABLE Sample_Table (
  Col_01 character varying(100) PRIMARY KEY COMMENT 'Single Quote',
  Col_02 text NOT NULL COMMENT "Double Quote",
  Col_03 integer COMMENT 'in "Quote"',
  Col_04 integer COMMENT "in 'Quote'",
  Col_05 integer COMMENT 'コメント is full-width(Japanese) character',
  Col_06 float -- COMMENT 'Comment out'
);

Add BigQuery DDL statement generate function

Requirements

Add DdlParseTable.to_bigquery_ddl function.
- BigQuery DDL (CREATE TABLE) statement generate function.

Add support Oracle Length Semantics for Character Datatypes

Problem

DDLの文字列カラムの定義に指定するサイズを文字数指定にした場合に、正しく終了しない。
ex.) 10カラムあるはずが、文字数指定したカラムまでしか出力されない

Example

$ cat test1.py
from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE SAMPLE_TABLE
(    COL1 NUMBER(20,0) NOT NULL ENABLE,
     COL2 VARCHAR2(45 CHAR) NOT NULL ENABLE,
     COL3 VARCHAR2(45 CHAR) NOT NULL ENABLE,
     COL4 NUMBER(20,0) NOT NULL ENABLE,
     COL5 NUMBER(20,0) NOT NULL ENABLE,
     COL6 NUMBER(20,0) NOT NULL ENABLE,
     COL7 NUMBER(20,0) NOT NULL ENABLE,
     COL8 NUMBER(20,0) NOT NULL ENABLE,
     COL9 NUMBER(20,0) NOT NULL ENABLE,
     COL10 NUMBER(20,0) NOT NULL ENABLE
)
"""

table = DdlParse().parse(sample_ddl)
parser = DdlParse(sample_ddl)
table = parser.parse()

print(table.to_bigquery_fields())

result

$ python test1.py
[{"name": "COL1", "type": "INTEGER", "mode": "REQUIRED"},{"name": "COL2", "type": "STRING", "mode": "NULLABLE"}]

Environment

$ pip freeze | grep ddlparse
ddlparse==1.1.2

Unknown data type VARBINARY

Hi,
Thank you all for this great package.
While using it to parse DDL I have stumbled on error: ValueError: Unknown data type : 'VARBINARY

Is this a bug or I'm missing something?

Regards,
Darek

Add support inline comment

Requirements

Add support inline comment.

Example DDL

CREATE TABLE Sample_Table (
  Col_01 number(50,1), -- comment
  Col_02 decimal(60,2)  -- comment
);

OracleのDate型（時刻入り）に対応できない

OracleのDate型は時刻を含めることができるが、現在の変換ロジックではBigQueryのDate型に１対１対応しているため、不整合が起きる

-参考

http://kanamelogic.com/20170204/date-type/

MySQL CREATE TABLE parsing fails. some columns are missing.

I tried to output a BigQuery column definition from a MySQL table definition, but some columns are missing.

Please show me how to avoid it by crafting CREATE TABLE, or if I used ddlparse incorrectly.

reproduce

environment

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.7
BuildVersion:	19H2
$ python -V
Python 3.8.3
$ ll $(python -c "import site; print (site.getsitepackages()[0])")/ | grep ddlparse
drwxr-xr-x    5 sakamossan  staff   160B  9 25 12:49 ddlparse
drwxr-xr-x    8 sakamossan  staff   256B  9 25 12:49 ddlparse-1.8.0.dist-info

execution

I used following files

/tmp/_.sql

CREATE TABLE `t` (
  `id` INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
  `created` DATETIME NULL,
  `modified` DATETIME NULL
) ENGINE = InnoDB;

/tmp/_.py

#!/usr/bin/env python
import sys
import json
from ddlparse import DdlParse

sql = sys.stdin.read()
table = DdlParse().parse(ddl=sql, source_database=DdlParse.DATABASE.mysql)
print(table.to_bigquery_fields())

$ cat /tmp/_.json | /tmp/_.py | jq .
[
  {
    "name": "id",
    "type": "INTEGER",
    "mode": "REQUIRED"
  },
  {
    "name": "created",
    "type": "DATETIME",
    "mode": "NULLABLE"
  }
]

modified column is not output.

problem ddl parse pattern

Problem DDL Example

CREATE TABLE `example_table` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `class_id` int(11) DEFAULT NULL,
  `group_id` int(11) DEFAULT NULL,
  `valid_date` date NOT NULL DEFAULT '0000-00-00',
  `count` int(11) DEFAULT NULL,
  `total` int(11) DEFAULT '0',
  `flag_1` int(11) DEFAULT '0',
  `balue_1` tinyint(1) DEFAULT '0',
  `flag_2` int(11) DEFAULT '0',
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `lock_version` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`,`valid_date`),
  UNIQUE KEY `uniq_vd_cc_sg` (`valid_date`,`class_id`,`group_id`,`flag_2`),
  KEY `sg_cc_vd` (`group_id`,`class_id`,`valid_date`)
) ENGINE=InnoDB AUTO_INCREMENT=21619109476 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Output BigQuery JSON Schema

[{
	"name": "id",
	"type": "INTEGER",
	"mode": "REQUIRED"
}, {
	"name": "class_id",
	"type": "INTEGER",
	"mode": "NULLABLE"
}, {
	"name": "group_id",
	"type": "INTEGER",
	"mode": "NULLABLE"
}, {
	"name": "valid_date",
	"type": "DATE",
	"mode": "REQUIRED"
}]

Oracle : Support for unspecified precision & scale in NUMBER type

Requirement

Support for unspecified precision & scale in Oracle NUMBER type
- -> Convert to BigQuery FLOAT type

Oracle document

http://otndnld.oracle.co.jp/document/products/oracle11g/111/doc_dvd/server.111/E05765-03/datatype.htm#17223

How can i get the comment content?

DDL can not parse the comment keyword~

Postgres/Redshift : Not parse DDL of "::" syntax in field attribute

Example

DDL (Redshift)

CREATE TABLE IF NOT EXISTS "sample_schema"."sample_table"
(
	"div_cd" VARCHAR(2) NOT NULL
	,"div_name" VARCHAR(30) NOT NULL
	,"org_cd" VARCHAR(8) NOT NULL
	,"org_name" VARCHAR(60) NOT NULL
	,"team_cd" VARCHAR(2) NOT NULL
	,"team_name" VARCHAR(120) NOT NULL
	,"personal_cd" VARCHAR(7) NOT NULL
	,"personal_name" VARCHAR(300) NOT NULL
	,"username" VARCHAR(6) NOT NULL
	,"staff_flg" CHAR(1)  DEFAULT '0'::bpchar ENCODE lzo
	,"leader_flg" CHAR(1)  DEFAULT '0'::bpchar ENCODE lzo
)
DISTSTYLE EVEN
;

ddlparse output BigQuery JSON schema

Not parse leader_flg field.

[{
	"name": "div_cd", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "div_name", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "org_cd", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "org_name", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "team_cd", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "team_name", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "personal_cd", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "personal_name", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "username", "type": "STRING", "mode": "REQUIRED"
}, {
	"name": "staff_flg", "type": "STRING", "mode": "NULLABLE"
}]

Parse error for MySQL DDL with comma in COMMENT string

if a comma in column's comment string, parse result will discard all other columns behind it.

all columns behind direction will be discarded.

If the ddl comment has multibyte characters, it will not be parsed correctly

Problem

If the ddl comment has multibyte characters, it will not be parsed correctly.

OK

code

$ cat comment_multibyte.py
from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE `table_01`
(
`col_1`       CHAR(7)     NOT NULL COMMENT 'コメント１',
`col_2`       CHAR(3)     NOT NULL COMMENT 'コメント２',
`col_3`       INT         NOT NULL COMMENT 'コメント３'
)
"""

table = DdlParse().parse(sample_ddl)
parser = DdlParse(sample_ddl)
table = parser.parse()

print(table.to_bigquery_fields())

result

$ python comment_multibyte.py
[{"name": "col_1", "type": "STRING", "mode": "REQUIRED"}]

NG

code

$ cat comment.py
from ddlparse import DdlParse

sample_ddl = """
CREATE TABLE `table_01`
(
`col_1`       CHAR(7)     NOT NULL COMMENT 'comment1',
`col_2`       CHAR(3)     NOT NULL COMMENT 'comment2',
`col_3`       INT         NOT NULL COMMENT 'comment3'
)
"""

table = DdlParse().parse(sample_ddl)
parser = DdlParse(sample_ddl)
table = parser.parse()

print(table.to_bigquery_fields())

result

$ python comment.py
[{"name": "col_1", "type": "STRING", "mode": "REQUIRED"},{"name": "col_2", "type": "STRING", "mode": "REQUIRED"},{"name": "col_3", "type": "INTEGER", "mode": "REQUIRED"}]

Environment

$ pip freeze | grep ddlparse
ddlparse==1.2.3

Not parsing all fields from PostgreSQL DDL

I use 1.2.0, this way:

DdlParse().parse(ddl=sql_ddl, source_database=DdlParse.DATABASE.postgresql).to_bigquery_fields()

Input:

CREATE TABLE my_table (
    id integer NOT NULL,
    user_id integer,
    first_name character varying,
    last_name character varying,
    birth_date timestamp without time zone,
    gender_status integer,
    marital_status integer,
    street_address text,
    zip_code character varying,
    city character varying,
    country character varying,
    handphone character varying,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    pending_deposit_amount numeric(18,4) DEFAULT 0.0,
    free_amount numeric(18,4) DEFAULT 0.0,
    secured_pledge_amount numeric(18,4) DEFAULT 0.0,
    unsecured_pledge_amount numeric(18,4) DEFAULT 0.0,
    pending_withdraw_amount numeric(18,4) DEFAULT 0.0
);

Output:

[
  {
    "name": "id",
     "type": "INTEGER",
     "mode": "REQUIRED"
  },
  {
    "name": "user_id",
     "type": "INTEGER",
     "mode": "NULLABLE"
  },
  {
    "name": "first_name",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "last_name",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "birth_date",
     "type": "DATETIME",
     "mode": "NULLABLE"
  },
  {
    "name": "gender_status",
     "type": "INTEGER",
     "mode": "NULLABLE"
  },
  {
    "name": "marital_status",
     "type": "INTEGER",
     "mode": "NULLABLE"
  },
  {
    "name": "street_address",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "zip_code",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "city",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "country",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "handphone",
     "type": "STRING",
     "mode": "NULLABLE"
  },
  {
    "name": "created_at",
     "type": "DATETIME",
     "mode": "REQUIRED"
  },
  {
    "name": "updated_at",
     "type": "DATETIME",
     "mode": "REQUIRED"
  },
  {
    "name": "pending_deposit_amount",
     "type": "FLOAT",
     "mode": "NULLABLE"
  }
]

Supports for Python 3.7

Pass Python 3.7 test.

shinichi-takii / ddlparse Goto Github PK

ddlparse's Introduction

DDL Parse

Features

Requirement

Installation

Install

Update

Usage

Example

License

Author

Links

Special Thanks

ddlparse's People

Contributors

Stargazers

Watchers

Forkers

ddlparse's Issues

Requirements

Requirements

Requirements

Example

Description

Requirements

Example code

Output

Requirements

Links

Requirements

Data-type Conditions

Example

Related issues

Reproduce the issue:

Error:

Requirements

Description

Reference

Example code

Output

Problem

patern A: OK

code

result

patern B: NG

code

result

Environment

Example

Example code

Output

Requirements

Links

Requirements

Example

Requirements

Problem

Example

result

Environment

Requirements

Example DDL

reproduce

environment

execution

/tmp/_.sql

/tmp/_.py

Problem DDL Example

Output BigQuery JSON Schema

Requirement

Oracle document

Example

DDL (Redshift)

ddlparse output BigQuery JSON schema

Problem

OK

code

result

NG