Git Product home page Git Product logo

avro-schema-generator's Introduction

Build Status Maven Central Codecov Chat in Gitter MIT License

avro-schema-generator

Library for generating avro schema files (.avsc) based on DB tables structure.

How it works:

  1. Prepare a db connection URL, username and a password. Use these to create a DbSchemaExtractor
  2. If you're fancy, configure FormatterConfig and AvroConfig.
  3. Tell schema extractor to give you avro models for tables that you're interested in.
  4. Pass those to SchemaGenerator which will give you perfect avro schemas.

Here's basic example:

// Initialize db crawler that will create avro model objects
DbSchemaExtractor schemaExtractor = new DbSchemaExtractor("jdbc:mysql://localhost:3306", "root", "pass");

AvroConfig avroConfig = new AvroConfig("some.namespace");
// Get avro models for a few tables
List<AvroSchema> schemas = schemaExtractor.getForTables(avroConfig, "mydb", "users", "payments");

for (AvroSchema schema : schemas) {
    // Pass avro model to SchemaGenerator, get schema and print it out.
    String schemaJson = SchemaGenerator.generate(schema);
    System.out.println(schemaJson);
}

More complex example:

DbSchemaExtractor schemaExtractor = new DbSchemaExtractor("jdbc:mysql://localhost:3306", "root", "pass");

// Some of available configuration options
AvroConfig avroConfig = new AvroConfig("some.namespace")
    .setRepresentEnumsAsStrings(true) // use 'string' avro type instead of 'enum' for enums
    .setAllFieldsDefaultNull(true)    // adds default: 'null' to fields definition
    .setNullableTrueByDefault(true)   // makes all fields nullable
    .setUseSqlCommentsAsDoc(true)     // use sql comments to fill 'doc' field
    .setSchemaNameMapper(new ToCamelCase().andThen(new RemovePlural())) // specify table name transformation to be used for schema name
    .setUnknownTypeResolver(type -> "string") // specify what to do with custom and unsupported db types
    .setDateTypeClass(Date.class) // add hint for avro compiler about which class to use for dates
    .setAvroSchemaPostProcessor((schema, table) -> {
        // adding some custom properties to avro schema
        schema.addCustomProperty("db-schema-name", "mydb");
        schema.addCustomProperty("db-table-name", table.getName());
});

// Get avro models for a few tables
List<AvroSchema> schemas = schemaExtractor.getAll(avroConfig);

// You can specify some formatting options by creating a FormatterConfig and passing it to SchemaGenerator.
FormatterConfig formatterConfig = FormatterConfig.builder()
            .setPrettyPrintSchema(true)
            .setPrettyPrintFields(false)
            .setIndent("    ")
            .build();

for (AvroSchema schema : schemas) {
    String schemaJson = SchemaGenerator.generate(schema, formatterConfig);
    System.out.println(schemaJson);
}

Maven dependency

<dependency>
  <groupId>com.github.artur-tamazian</groupId>
  <artifactId>avro-schema-generator</artifactId>
  <version>1.0.10</version>
</dependency>

Supported databases

DB crawling is done using the SchemaCrawler http://www.schemacrawler.com/. So it should work fine with RDBMS mentioned here: http://www.schemacrawler.com/database-support.html.

avro-schema-generator itself was successfully used with MySQL and PostgreSQL.

TODO / Suggested contributions

  • Refactor using Project Lombok
  • Add missing or db-specific types support
  • Migrate to Gradle

avro-schema-generator's People

Contributors

artur-tamazian avatar atfire avatar dependabot[bot] avatar mredjem avatar vikas22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

avro-schema-generator's Issues

Update SchemaCrawler

I want to suggest updating the shemaCrawler and checking the error unknown data type: number when trying to generate avro from an Oracle database

Support for DB names with `-` in name.

Hi,
for databases which contain - (hyphen) in db name, avro generation fails with following error.

Exception in thread "main" java.lang.RuntimeException: schemacrawler.schemacrawler.SchemaCrawlerException: No matching schemas found
	at com.at.avro.DbSchemaExtractor.get(DbSchemaExtractor.java:109)
	at com.at.avro.DbSchemaExtractor.getForTables(DbSchemaExtractor.java:57)
	at com.razorpay.edh.test_avro$.delayedEndpoint$com$razorpay$edh$test_avro$1(test_avro.scala:45)
	at com.razorpay.edh.test_avro$delayedInit$body.apply(test_avro.scala:11)
	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App$$anonfun$main$1.apply(App.scala:76)
	at scala.App$$anonfun$main$1.apply(App.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
	at scala.App$class.main(App.scala:76)
	at com.razorpay.edh.test_avro$.main(test_avro.scala:11)
	at com.razorpay.edh.test_avro.main(test_avro.scala)
Caused by: schemacrawler.schemacrawler.SchemaCrawlerException: No matching schemas found
	at schemacrawler.crawl.SchemaCrawler.crawlSchemas(SchemaCrawler.java:426)
	at schemacrawler.crawl.SchemaCrawler.crawl(SchemaCrawler.java:144)
	at schemacrawler.tools.catalogloader.SchemaCrawlerCatalogLoader.loadCatalog(SchemaCrawlerCatalogLoader.java:59)
	at schemacrawler.tools.catalogloader.ChainedCatalogLoader.loadCatalog(ChainedCatalogLoader.java:77)
	at schemacrawler.tools.utility.SchemaCrawlerUtility.getCatalog(SchemaCrawlerUtility.java:103)
	at schemacrawler.tools.utility.SchemaCrawlerUtility.getCatalog(SchemaCrawlerUtility.java:85)
	at com.at.avro.DbSchemaExtractor.get(DbSchemaExtractor.java:84)

it works perfectly fine with the databases which does not contain - in the their name.

if have used following code to reproduce issue

.
.
.
val db = "test-delete"
val url = s"jdbc:mysql://${host}:${port}/${db}"
val schemaExtractor = new DbSchemaExtractor(url, user, password)
schemas = schemaExtractor
      .getForTables(avroConfig, db, tables: _*)
      .asScala
      .toList

while searching at the issue at https://github.com/schemacrawler/SchemaCrawler
i came across this related issue. schemacrawler/SchemaCrawler#176
if possible please check on how to implement the solution.

Used avro-schema-generator version: 1.0.7
Please do let me know if you need any info or help.

Thanks and regards,
Raj

Value out of range for datatype java.lang.Integer for column `cardinality`.

While generating avro schemas for some tables it is failing with above error.

from initial debugging it looks like it is using integer datatype to store values fetched from cardinality column from information_schema.statistics table. i am guessing this is called when trying to fetch index metadata from db.
attaching more error logs for your reference.

Caused by: com.mysql.cj.exceptions.NumberOutOfRange: Value '2589011712' is outside of valid range for type java.lang.Integer

warn: Could not read integer value for column

java.sql.SQLDataException: Value '2589011712' is outside of valid range for type java.lang.Integer
App > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:114)
App > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
App > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
App > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
App > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)
App > at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:92)
App > at com.mysql.cj.jdbc.result.ResultSetImpl.getObject(ResultSetImpl.java:1411)
App > at com.mysql.cj.jdbc.result.ResultSetImpl.getInt(ResultSetImpl.java:816)
App > at com.mysql.cj.jdbc.result.ResultSetImpl.getInt(ResultSetImpl.java:837)
App > at schemacrawler.crawl.MetadataResultSet.getInt(MetadataResultSet.java:340)
App > at schemacrawler.crawl.IndexRetriever.createIndexForTable(IndexRetriever.java:182)
App > at schemacrawler.crawl.IndexRetriever.createIndexes(IndexRetriever.java:146)
App > at schemacrawler.crawl.IndexRetriever.retrieveIndexesFromMetadata(IndexRetriever.java:343)
App > at schemacrawler.crawl.IndexRetriever.retrieveIndexesFromMetadata(IndexRetriever.java:359)
App > at schemacrawler.crawl.IndexRetriever.retrieveIndexes(IndexRetriever.java:98)
App > at schemacrawler.crawl.SchemaCrawler.lambda$crawlTables$22(SchemaCrawler.java:615)
App > at sf.util.StopWatch.time(StopWatch.java:177)
App > at schemacrawler.crawl.SchemaCrawler.crawlTables(SchemaCrawler.java:609)
App > at schemacrawler.crawl.SchemaCrawler.crawl(SchemaCrawler.java:766)
App > at schemacrawler.utility.SchemaCrawlerUtility.getCatalog(SchemaCrawlerUtility.java:89)
App > at com.at.avro.DbSchemaExtractor.get(DbSchemaExtractor.java:71)
App > at com.at.avro.DbSchemaExtractor.getForTables(DbSchemaExtractor.java:49)

note: i am very new to github OSS. please ignore my mistakes and please let me know if you need any more info on this.
Thanks

Need support for bigint/int unsigned type

Ex: schema

| Field                    | Type                         | Null   | Key  | Default  |
| amount               | bigint(20) unsigned | NO   | MUL | NULL    |

Library throws

Exception in thread "main" java.lang.IllegalArgumentException: unknown data type: bigint unsigned
	at com.at.avro.config.AvroConfig.lambda$new$2(AvroConfig.java:30)
	at com.at.avro.AvroTypeUtil.getPrimitiveType(AvroTypeUtil.java:97)
	at com.at.avro.AvroTypeUtil.getAvroType(AvroTypeUtil.java:43)
	at com.at.avro.AvroField.<init>(AvroField.java:25)
	at com.at.avro.AvroSchema.<init>(AvroSchema.java:26)
	at com.at.avro.DbSchemaExtractor.get(DbSchemaExtractor.java:93)
	at com.at.avro.DbSchemaExtractor.getForTables(DbSchemaExtractor.java:57)
	at scala.Function0.apply$mcV$sp(Function0.scala:34)
	at scala.Function0.apply$mcV$sp$(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App.$anonfun$main$1$adapted(App.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:388)
	at scala.App.main(App.scala:76)
	at scala.App.main$(App.scala:74)

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2

Using avro-schema-generator to generate schemas from Oracle Database Tables, I'm getting the following error at the code block:
String schemaJson = SchemaGenerator.generate(schema);

Error:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at java.lang.AbstractStringBuilder.setLength(AbstractStringBuilder.java:207)
at java.lang.StringBuilder.setLength(StringBuilder.java:76)
at com.at.avro.formatters.SchemaFormatter.formatFields(SchemaFormatter.java:48)
at com.at.avro.formatters.SchemaFormatter.toJson(SchemaFormatter.java:30)
at com.at.avro.formatters.SchemaFormatter.toJson(SchemaFormatter.java:12)
at com.at.avro.SchemaGenerator.generate(SchemaGenerator.java:22)
at GenerateAvroSchema.main(GenerateAvroSchema.java:57)

Any idea what could be causing this ?

Better SQL Server column data type support

Related to #3 if I run this .getAll(avroConfig) using MS SQL Server JDBC driver 9.2 against a completely vanilla SQL server instance/fresh install (meaning no databases other than the system default databases), I get errors on various data types such as money, sysname etc.

This is straight forward to reproduce - install SQL Server 2019 Express and then connect using the MS JDBC driver then call

String sqlConnStr = "jdbc:sqlserver://127.0.0.1\\SQL2019;databaseName=AvroGen";
DbSchemaExtractor schemaExtractor = new DbSchemaExtractor(sqlConnStr, "sa", "mypassword");
AvroConfig avroConfig = new AvroConfig("some.namespace");
List<AvroSchema> schemas = schemaExtractor.getAll(avroConfig);

Unfortunately I'm not familiar enough with Java to contribute - I barely got this working in NetBeans.

Slow performance with schema containing large number of tables

Running the following code with latest version (1.0.7) doesn't seems to provide results within reasonable amount of time on schema containing large number of tables (~650):

        DbSchemaExtractor schemaExtractor = new DbSchemaExtractor("jdbc:mysql://XXX", "XXX", "XXX");
        AvroConfig avroConfig = new AvroConfig("XXX");
        List<AvroSchema> schemas = schemaExtractor.getForTables(avroConfig, "MY_SCHEMA", "MY_TABLE");

I have substitute the actual values with either XXX or MY_....

This code intends to fetch the AvroSchema object for a single table, however, it seems that in the background the code iterate over all the tables.
I have run this code and stopped it after 30 minutes of execution. it was spewing out lots of logs similar to this:

Jun 07, 2021 1:21:31 PM schemacrawler.crawl.ResultsCrawler crawl
INFO: Crawling result set
Jun 07, 2021 1:21:31 PM schemacrawler.crawl.ResultsCrawler crawl
INFO: Total time taken for <crawlResultSet> - 00:00:00.005 hours
-100.0% - 00:00:00.005 - <retrieveResults>

Having a DB with around 500-700 tables, is a typical use case for a medium monolitch software and itsn't reasonable that it takes more than a minute to produce Avro schema for a single table.

Support for SQL array

i have a table my_table with such sql declaring a field as pgsql array

....
 exclude_pm    | text[]                      |
....

Having java code

DbSchemaExtractor schemaExtractor = new DbSchemaExtractor("...", "...", "...")

AvroConfig avroConfig = new AvroConfig("ua.eshepelyuk")

List<AvroSchema> schemas = schemaExtractor.getForTables(avroConfig, "public", "my_table")

for (AvroSchema schema : schemas) {
  System.out.println(SchemaGenerator.generate(schema))
}

I receive exception

Caught: java.lang.IllegalArgumentException: unknown data type: _text
java.lang.IllegalArgumentException: unknown data type: _text
	at com.at.avro.config.AvroConfig.lambda$new$2(AvroConfig.java:30)
	at com.at.avro.AvroTypeUtil.getPrimitiveType(AvroTypeUtil.java:103)
	at com.at.avro.AvroTypeUtil.getAvroType(AvroTypeUtil.java:43)
	at com.at.avro.AvroField.<init>(AvroField.java:25)
	at com.at.avro.AvroSchema.<init>(AvroSchema.java:26)
	at com.at.avro.DbSchemaExtractor.get(DbSchemaExtractor.java:93)
	at com.at.avro.DbSchemaExtractor.getForTables(DbSchemaExtractor.java:57)
	at AvroGen.run(AvroGen.groovy:13)

Add Avro doc field ?

Hi,

Are there any plans to add the Avro doc field to the schema and its fields ?

I'm looking to generate Avro schemas on the fly and document each field based on the SQL comments.
I've been able to successfully generate a documented schema based on what SchemaCrawler fetched.

I made a small development on a local branch to test it out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.