Git Product home page Git Product logo

dbeam's Introduction

DBeam

Github Actions Build Status codecov.io Apache Licensed GitHub tag Maven Central

A connector tool to extract data from SQL databases and import into GCS using Apache Beam.

This tool is runnable locally, or on any other backend supported by Apache Beam, e.g. Cloud Dataflow.

DEVELOPMENT STATUS: Mature, maintained and used in production since August 2017. No major features or development planned.

Overview

DBeam is tool based that reads all the data from single SQL database table, converts the data into Avro and stores it into appointed location, usually in GCS. It runs as a single threaded Apache Beam pipeline.

DBeam requires the database credentials, the database table name to read, and the output location to store the extracted data into. DBeam first makes a single select into the target table with limit one to infer the table schema. After the schema is created the job will be launched which simply streams the table contents via JDBC into target location as Avro.

Generated Avro Schema Type Conversion Details

dbeam-core package features

  • Supports both PostgreSQL and MySQL JDBC connectors
  • Supports Google CloudSQL managed databases
  • Currently output only to Avro format
  • Reads database from an external password file (--passwordFile) or an external KMS encrypted password file (--passwordFileKmsEncrypted)
  • Can filter only records of the current day with the --partitionColumn parameter
  • Check and fail on too old partition dates. Snapshot dumps are not filtered by a given date/partition, when running for a too old partition, the job fails to avoid new data in old partitions. (can be disabled with --skipPartitionCheck)
  • Implemented as Apache Beam SDK pipeline, supporting any of its runners (tested with DirectRunner and DataflowRunner)

DBeam export parameters

com.spotify.dbeam.options.DBeamPipelineOptions:

  --connectionUrl=<String>
    The JDBC connection url to perform the export.
  --password=<String>
    Plaintext password used by JDBC connection.
  --passwordFile=<String>
    A path to a file containing the database password.
  --passwordFileKmsEncrypted=<String>
    A path to a file containing the database password, KMS encrypted and base64
    encoded.
  --sqlFile=<String>
    A path to a file containing a SQL query (used instead of --table parameter).
  --table=<String>
    The database table to query and perform the export.
  --username=<String>
    Default: dbeam-extractor
    The database user name used by JDBC to authenticate.

com.spotify.dbeam.options.OutputOptions:

  --output=<String>
    The path for storing the output.
  --dataOnly=<Boolean>
    Default: false
    Store only the data files in output folder, skip queries, metrics and
    metadata files.

com.spotify.dbeam.options.JdbcExportPipelineOptions:
    Configures the DBeam SQL export

  --avroCodec=<String>
    Default: deflate6
    Avro codec (e.g. deflate6, deflate9, snappy).
  --avroDoc=<String>
    The top-level record doc string of the generated avro schema.
  --avroSchemaFilePath=<String>
    Path to file with a target AVRO schema.
  --avroSchemaName=<String>
    The name of the generated avro schema, the table name by default.
  --avroSchemaNamespace=<String>
    Default: dbeam_generated
    The namespace of the generated avro schema.
  --exportTimeout=<String>
    Default: P7D
    Export timeout, after this duration the job is cancelled and the export
    terminated.
  --fetchSize=<Integer>
    Default: 10000
    Configures JDBC Statement fetch size.
  --limit=<Long>
    Limit the output number of rows, indefinite by default.
  --minPartitionPeriod=<String>
    The minimum partition required for the job not to fail (when partition
    column is not specified),by default `now() - 2*partitionPeriod`.
  --minRows=<Long>
    Default: -1
    Check that the output has at least this minimum number of rows. Otherwise
    fail the job.
  --partition=<String>
    The date/timestamp of the current partition.
  --partitionColumn=<String>
    The name of a date/timestamp column to filter data based on current
    partition.
  --partitionPeriod=<String>
    The period frequency which the export runs, used to filter based on current
    partition and also to check if exports are running for too old partitions.
  --preCommand=<List>
    SQL commands to be executed before query.
  --queryParallelism=<Integer>
    Max number of queries to run in parallel for exports. Single query used if
    nothing specified. Should be used with splitColumn.
  --skipPartitionCheck=<Boolean>
    Default: false
    When partition column is not specified, fails if partition is too old; set
    this flag to ignore this check.
  --splitColumn=<String>
    A long/integer column used to create splits for parallel queries. Should be
    used with queryParallelism.
  --useAvroLogicalTypes=<Boolean>
    Default: false
    Controls whether generated Avro schema will contain logicalTypes or not.

Input Avro schema file

If provided an input Avro schema file, dbeam will read input schema file and use some of the properties when an output Avro schema is created.

Following fields will be propagated from input into output schema:

  • record.doc
  • record.namespace
  • record.field.doc

DBeam Parallel Mode

This is a pre-alpha feature currently under development and experimentation.

Read queries used by dbeam to extract data generally don't place any locks, and hence multiple read queries can run in parallel. When running in parallel mode with --queryParallelism specified, dbeam looks for --splitColumn argument to find the max and min values in that column. The max and min are then used as range bounds for generating queryParallelism number of queries which are then run in parallel to read data. Since the splitColumn is used to calculate the query bounds, and dbeam needs to calculate intermediate bounds for each query, the type of the column must be long / int. It is assumed that the distribution of values on the splitColumn is sufficiently random and sequential. Example if the min and max of the split column is divided equally into query parallelism parts, each part would contain approximately equal number of records. Having skews in this data would result in straggling queries, and hence wont provide much improvement. Having the records sequential would help in having the queries run faster and it would reduce random disk seeks.

Recommended usage: Beam would run each query generated by DBeam in 1 dedicated vCPU (when running with Dataflow Runner), thus for best performance it is recommended that the total number of vCPU available for a given job should be equal to the queryParallelism specified. Hence if workerMachineType for Dataflow is n1-standard-w and numWorkers is n then queryParallelism q should be a multiple of n*w and the job would be fastest if q = n * w.

For an export of a table running from a dedicated PostgresQL replica, we have seen best performance over vCPU time and wall time when having a queryParallelism of 16. Bumping queryParallelism further increases the vCPU time without offering much gains on the wall time of the complete export. It is probably good to use queryParallelism less than 16 for experimenting.

Building

Building and testing can be achieved with mvn:

mvn verify

In order to create a jar with all dependencies under ./dbeam-core/target/dbeam-core-shaded.jar run the following:

mvn clean package -Ppack

Usage examples

Using Java from the command line:

java -cp ./dbeam-core/target/dbeam-core-shaded.jar \
  com.spotify.dbeam.jobs.JdbcAvroJob \
  --output=gs://my-testing-bucket-name/ \
  --username=my_database_username \
  --password=secret \
  --connectionUrl=jdbc:postgresql://some.database.uri.example.org:5432/my_database \
  --table=my_table

For CloudSQL:

java -cp ./dbeam-core/target/dbeam-core-shaded.jar \
  com.spotify.dbeam.jobs.JdbcAvroJob \
  --output=gs://my-testing-bucket-name/ \
  --username=my_database_username \
  --password=secret \
  --connectionUrl=jdbc:postgresql://google/database?socketFactory=com.google.cloud.sql.postgres.SocketFactory&socketFactoryArg=project:region:cloudsql-instance \
  --table=my_table
  • When using MySQL: --connectionUrl=jdbc:mysql://google/database?socketFactory=com.google.cloud.sql.mysql.SocketFactory&cloudSqlInstance=project:region:cloudsql-instance&useCursorFetch=true
  • Note ?useCursorFetch=true is important for MySQL, to avoid early fetching all rows, more details on MySQL docs.
  • More details can be found at CloudSQL JDBC SocketFactory

To run a cheap data extraction, as a way to validate, one can add --limit=10 --skipPartitionCheck parameters. It will run the queries, generate the schemas and export only 10 records, which should be done in a few seconds.

Password configuration

Database password can be configured by simply passing --password=writepasswordhere, --passwordFile=/path/to/file/containing/password or --passwordFile=gs://gcs-bucket/path/to/file/containing/password.

A more robust configuration is to point to a Google KMS encrypted file. DBeam will try to decrypt using KMS if the file ends with .encrypted (e.g. --passwordFileKmsEncrypted=gs://gcs-bucket/path/to/db-password.encrypted).

The file should contain a base64 encoded encrypted content. It can be generated using gcloud like the following:

echo -n "super_secret_password" \
  | gcloud kms encrypt \
      --location "global" \
      --keyring "dbeam" \
      --key "default" \
      --project "mygcpproject" \
      --plaintext-file - \
      --ciphertext-file - \
  | base64 \
  | gsutil cp - gs://gcs-bucket/path/to/db-password.encrypted

KMS location, keyring, and key can be configured via Java Properties, defaults are:

java \
  -DKMS_KEYRING=dbeam \
  -DKMS_KEY=default \
  -DKMS_LOCATION=global \
  -DKMS_PROJECT=default_gcp_project \
  -cp ./dbeam-core/target/dbeam-core-shaded.jar \
  com.spotify.dbeam.jobs.JdbcAvroJob \
  ...

Using as a library

To include DBeam library in a mvn project add the following dependency in pom.xml:

<dependency>
  <groupId>com.spotify</groupId>
  <artifactId>dbeam-core</artifactId>
  <version>${dbeam.version}</version>
</dependency>

To include DBeam library in a SBT project add the following dependency in build.sbt:

  libraryDependencies ++= Seq(
   "com.spotify" % "dbeam-core" % dbeamVersion
  )

Development

Make sure you have mvn installed. For editor, IntelliJ IDEA is recommended.

To test and verify changes during development, run:

mvn verify

Or:

mvn verify -Pcoverage

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

Release

Every push to master will deploy a snapshot version to Sonatype. You can check the deployment in the following links:

Future roadmap

DBeam is mature, maintained and used in production since August 2017. No major features or development planned. Like Redis/Redict, DBeam can be considered a finished product.

It can be maintained for decades to come with minimal effort. It can continue to provide a high amount of value for a low amount of labor.


License

Copyright 2016-2022 Spotify AB.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0


dbeam's People

Contributors

anish749 avatar astricks avatar dependabot-preview[bot] avatar dependabot[bot] avatar farzad-sedghi avatar fjgal avatar hlagosp avatar labianchin avatar loisaidasam avatar mattfinkel avatar perploug avatar rulle-io avatar rulle-sp avatar tfoldi avatar varjoranta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbeam's Issues

Set table and field doc from postgres COMMENT

Postgres has a non-standard SQL concept called comment:

https://www.postgresql.org/docs/current/sql-comment.html

It shows up when you type \d+ in psql and a few other places and is a pretty nice way of documenting your database with the database.

It would be nice if these comments could be used to populate the avro doc fields. It will obviously be quite hard to figure out column descriptions if you run dbeam with --sqlFile, but if you run with --table it should be easier. This code could always try to probe the database for comments or only be activated by a flag.

If I have time later, I will try to add a pull-request, just wanted to first post my idea/feature request.

Support for query

Hello,

Does dbeam support passing in queries as an option? I can pass in a table name, but what if i want to get the avro output of a query (table maybe joined with other tables or only select attributes)?
if you don't support it, any plans to support it in future?

Remove slick

slick is used to create test fixtures.

Since it usage is very small, there is the opportunity to simply remove it and just use plain SQL queries.

Leaving the issue here for discussion and also if somebody want to take a stab on this.

Support for parallel exports

Introduce support for running a single export as multiple SQL queries.

There might be multiple ways to support that:

  1. User provides a way to split the exports. e.g. split id column on ranges [0, 10000000] , [10000001, 20000000], ...
  2. User specify parallelism, automatically detect splits. e.g. SELECT min(id) as min_id, max(id) as max_id.

Sonatype release from travis

Release is currently manual and it does not run in travis.

To release make sure to:

git clone [email protected]:spotify/dbeam.git dbeam_release1
cd $_
git config user.email $USER@users.noreply.github.com
git checkout master && git branch --set-upstream-to=origin/master master
time sbt '+publishLocalSigned' 'release cross with-defaults'

We should investigate having the release performed by travis.

A few references:

https://alexn.org/blog/2017/08/16/automatic-releases-sbt-travis.html
https://github.com/scalacenter/sbt-release-early/wiki/How-to-release-in-Travis-(CI)
https://github.com/making/travis-ci-maven-deploy-skelton
https://github.com/spotify/spydra/blob/master/.travis.yml#L11

SQLException: Zero date value prohibited

Hi! First of all thank you for this project. I think it's filling really well the need of migrating data now to GCS now that so many companies are trying to move to the cloud.

I found this issue when replicating a table fro MySql: SQLException: Zero date value prohibited

I understand that Avro wouldn't allow for such a date the way MySql does but there must be a way of replicating this data. Maybe allowing to override the query for specific columns? Some command line option like this is one potential way of being able to replicate this table:
--override-query-column=activity_date:LEAST(activity_date, '1970-01-01')

Would this be possible to add to dbeam? Thanks!

Rewrite tests in JUnit

DBeam is now written fully in Java, the only thing left is tests which are still written in Scala/Scalatest. We'd like to move the tests to Java/JUnit.

This depends on #12 .

Handling of unsigned int fields in MySQL

I think the computeMapping function does not properly handle unsigned int fields in MySQL. It winds up calling .getInt(..) on a value that may be greater than the int max due to having unsigned only values in it.

https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-type-conversions.html seems to suggest that GetColumnClassName rather than GetColumnTypeName will return the correct type (long instead of int) for these cases, but I have not been able to confirm that.

Example error:

java.sql.SQLDataException: Value '2190526558' is outside of valid range for type java.lang.Integer 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:114) 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73) 
com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:92) 
com.mysql.cj.jdbc.result.ResultSetImpl.getObject(ResultSetImpl.java:1382) 
com.mysql.cj.jdbc.result.ResultSetImpl.getInt(ResultSetImpl.java:786) 
com.spotify.dbeam.avro.JdbcAvroRecord.lambda$computeMapping$3(JdbcAvroRecord.java:96)

Incorrect user-supplied Avro schema (--avroSchemaFilePath) causes dbeam to produce invalid avro files.

When an Avro user-supplied Avro schema doesn't correspond to actual columns returned by SQL SELECT statement, dbeam produces avro data files, which cause exceptions be thrown when users try to read them.

Identified use-cases

1. Avro schema has less fields than SQL SELECT columns

 * SQL SELECT has columns: COF_NAME, plus many others ...
 * Avro schema has fields: COF_NAME (fewer than expected).
 * This scenario produces an Avro file, which seems to OK,
 * but an exception is thrown when one tries to read it.
 * org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -50

2. Avro schema has other order of fields than SQL SELECT columns

 * SQL SELECT has columns: COF_NAME, SIZE, TOTAL
 * Avro schema has fields: TOTAL, COF_NAME, SIZE (another order).
 * This scenario produces an Avro file, which seems to OK,
 * but an exception is thrown when one tries to read it.
 * java.lang.ArrayIndexOutOfBoundsException: Index -50 out of bounds for length 2

Compilation fails

Hello, I was trying out this tool and however could not compile on windows. It will be great if you could help on this. I was also trying to get familiar with the implementation, if you could provide some summary on the classes that I should look at that , it will be great.

Thanks
Manohar

[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] dbeam-parent [pom]
[INFO] DBeam Core [jar]
[INFO]
[INFO] ----------------------< com.spotify:dbeam-parent >----------------------
[INFO] Building dbeam-parent 0.9.15-SNAPSHOT [1/2]
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ dbeam-parent ---
[INFO] Deleting C:\beam-tutorials\dbeam\target
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce) @ dbeam-parent ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-banned-dependencies) @ dbeam-parent ---
[INFO]
[INFO] --- maven-checkstyle-plugin:3.0.0:check (validate) @ dbeam-parent ---
[INFO] Starting audit...
Audit done.
[INFO]
[INFO] --- license-maven-plugin:1.9:check-file-header (check-file-header) @ dbeam-parent ---
[WARNING] No file to scan.
[INFO]
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default) @ dbeam-parent ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-shade-plugin:3.2.3:shade (bundle-and-repackage) @ dbeam-parent ---
[INFO]
[INFO] -----------------------< com.spotify:dbeam-core >-----------------------
[INFO] Building DBeam Core 0.9.15-SNAPSHOT [2/2]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ dbeam-core ---
[INFO] Deleting C:\beam-tutorials\dbeam\dbeam-core\target
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce) @ dbeam-core ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-banned-dependencies) @ dbeam-core ---
[INFO]
[INFO] --- maven-checkstyle-plugin:3.0.0:check (validate) @ dbeam-core ---
[INFO] Starting audit...
Audit done.
[INFO]
[INFO] --- license-maven-plugin:1.9:check-file-header (check-file-header) @ dbeam-core ---
[INFO] Will search files to update from root C:\beam-tutorials\dbeam\dbeam-core\src\main\java
[INFO] Will search files to update from root C:\beam-tutorials\dbeam\dbeam-core\src\test\java
[INFO] Scan 44 files header done in 201.225ms.
[INFO] All files are up-to-date.
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ dbeam-core ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ dbeam-core ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 28 source files to C:\beam-tutorials\dbeam\dbeam-core\target\classes
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] No processor claimed any of these annotations: java.lang.FunctionalInterface,org.apache.beam.sdk.options.Validation.Required,com.google.auto.value.AutoValue.Builder,org.apache.beam.sdk.options.Default.String,com.google.auto.value.AutoValue,org.apache.beam.sdk.options.Default.Integer,com.google.common.annotations.VisibleForTesting,org.apache.beam.sdk.options.Default.Boolean,javax.annotation.Nullable,org.apache.beam.sdk.options.Description
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[3,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] No processor claimed any of these annotations: javax.annotation.Generated,javax.annotation.Nullable
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[3,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[3,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/ParallelQueryBuilder.java:[35,8] serializable class com.spotify.dbeam.args.ParallelQueryBuilder has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/QueryBuilder.java:[29,1] serializable class com.spotify.dbeam.args.QueryBuilder has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/JdbcExportArgs.java:[32,17] serializable class com.spotify.dbeam.args.JdbcExportArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/JdbcAvroArgs.java:[33,17] serializable class com.spotify.dbeam.args.JdbcAvroArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/QueryBuilderArgs.java:[41,17] serializable class com.spotify.dbeam.args.QueryBuilderArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/JdbcConnectionArgs.java:[34,17] serializable class com.spotify.dbeam.args.JdbcConnectionArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[98,21] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[116,50] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[155,11] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[158,15] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[165,13] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[171,18] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[171,48] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/avro/JdbcAvroRecordConverter.java:[59,27] found raw type: com.spotify.dbeam.avro.JdbcAvroRecord.SqlFunction
missing type arguments for generic class com.spotify.dbeam.avro.JdbcAvroRecord.SqlFunction<T,R>
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/avro/JdbcAvroIO.java:[85,10] serializable class com.spotify.dbeam.avro.JdbcAvroIO.JdbcAvroSink has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/avro/JdbcAvroIO.java:[105,18] serializable class com.spotify.dbeam.avro.JdbcAvroIO.JdbcAvroWriteOperation has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/jobs/NotReadyException.java:[23,8] serializable class com.spotify.dbeam.jobs.NotReadyException has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_JdbcExportArgs.java:[9,7] serializable class com.spotify.dbeam.args.AutoValue_JdbcExportArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_JdbcAvroArgs.java:[8,7] serializable class com.spotify.dbeam.args.AutoValue_JdbcAvroArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[24,26] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[33,16] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[74,12] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[141,22] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[199,54] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_JdbcConnectionArgs.java:[7,7] serializable class com.spotify.dbeam.args.AutoValue_JdbcConnectionArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_QueryBuilderArgs.java:[9,7] serializable class com.spotify.dbeam.args.AutoValue_QueryBuilderArgs has no definition of serialVersionUID
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default) @ dbeam-core ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 28 source files to C:\beam-tutorials\dbeam\dbeam-core\target\classes
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] No processor claimed any of these annotations: java.lang.FunctionalInterface,org.apache.beam.sdk.options.Validation.Required,com.google.auto.value.AutoValue.Builder,org.apache.beam.sdk.options.Default.String,com.google.auto.value.AutoValue,org.apache.beam.sdk.options.Default.Integer,com.google.common.annotations.VisibleForTesting,org.apache.beam.sdk.options.Default.Boolean,javax.annotation.Nullable,org.apache.beam.sdk.options.Description
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[3,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] No processor claimed any of these annotations: javax.annotation.Generated,javax.annotation.Nullable
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[3,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[3,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/ParallelQueryBuilder.java:[35,8] serializable class com.spotify.dbeam.args.ParallelQueryBuilder has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/QueryBuilder.java:[29,1] serializable class com.spotify.dbeam.args.QueryBuilder has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/JdbcExportArgs.java:[32,17] serializable class com.spotify.dbeam.args.JdbcExportArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/JdbcAvroArgs.java:[33,17] serializable class com.spotify.dbeam.args.JdbcAvroArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/QueryBuilderArgs.java:[41,17] serializable class com.spotify.dbeam.args.QueryBuilderArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/args/JdbcConnectionArgs.java:[34,17] serializable class com.spotify.dbeam.args.JdbcConnectionArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[98,21] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[116,50] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[155,11] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[158,15] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[165,13] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[171,18] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/options/KmsDecrypter.java:[171,48] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/avro/JdbcAvroRecordConverter.java:[59,27] found raw type: com.spotify.dbeam.avro.JdbcAvroRecord.SqlFunction
missing type arguments for generic class com.spotify.dbeam.avro.JdbcAvroRecord.SqlFunction<T,R>
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/avro/JdbcAvroIO.java:[85,10] serializable class com.spotify.dbeam.avro.JdbcAvroIO.JdbcAvroSink has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/avro/JdbcAvroIO.java:[105,18] serializable class com.spotify.dbeam.avro.JdbcAvroIO.JdbcAvroWriteOperation has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/main/java/com/spotify/dbeam/jobs/NotReadyException.java:[23,8] serializable class com.spotify.dbeam.jobs.NotReadyException has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_JdbcExportArgs.java:[9,7] serializable class com.spotify.dbeam.args.AutoValue_JdbcExportArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_JdbcAvroArgs.java:[8,7] serializable class com.spotify.dbeam.args.AutoValue_JdbcAvroArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[24,26] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[33,16] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[74,12] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[141,22] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/options/AutoValue_KmsDecrypter.java:[199,54] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_JdbcConnectionArgs.java:[7,7] serializable class com.spotify.dbeam.args.AutoValue_JdbcConnectionArgs has no definition of serialVersionUID
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/target/generated-sources/annotations/com/spotify/dbeam/args/AutoValue_QueryBuilderArgs.java:[9,7] serializable class com.spotify.dbeam.args.AutoValue_QueryBuilderArgs has no definition of serialVersionUID
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ dbeam-core ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\beam-tutorials\dbeam\dbeam-core\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ dbeam-core ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 16 source files to C:\beam-tutorials\dbeam\dbeam-core\target\test-classes
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/test/java/com/spotify/dbeam/options/PasswordReaderTest.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] No processor claimed any of these annotations: com.google.auto.value.AutoValue,org.junit.Test,org.junit.AfterClass,org.junit.Ignore,org.junit.BeforeClass
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/test/java/com/spotify/dbeam/options/PasswordReaderTest.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] No processor claimed any of these annotations: javax.annotation.Generated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/test/java/com/spotify/dbeam/options/PasswordReaderTest.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/test/java/com/spotify/dbeam/options/PasswordReaderTest.java:[23,52] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/test/java/com/spotify/dbeam/args/ParallelQueryBuilderTest.java:[72,11] assertThat(T,org.hamcrest.Matcher<? super T>) in org.junit.Assert has been deprecated
[WARNING] /C:/beam-tutorials/dbeam/dbeam-core/src/test/java/com/spotify/dbeam/options/PasswordReaderTest.java:[77,46] com.google.api.client.googleapis.auth.oauth2.GoogleCredential in com.google.api.client.googleapis.auth.oauth2 has been deprecated
[INFO]
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ dbeam-core ---
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.spotify.dbeam.args.JdbcExportOptionsTest
[ERROR] Tests run: 31, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.302 s <<< FAILURE! - in com.spotify.dbeam.args.JdbcExportOptionsTest
[ERROR] com.spotify.dbeam.args.JdbcExportOptionsTest Time elapsed: 0.044 s <<< ERROR!
java.nio.file.FileSystemException:
C:\Users\manoh\AppData\Local\Temp\query5627387092890552426.sql: The process cannot access the file because it is being used by another process.

    at com.spotify.dbeam.args.JdbcExportOptionsTest.afterAll(JdbcExportOptionsTest.java:50)

[INFO] Running com.spotify.dbeam.args.ParallelQueryBuilderTest
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.005 s - in com.spotify.dbeam.args.ParallelQueryBuilderTest
[INFO] Running com.spotify.dbeam.args.QueryBuilderArgsTest
[INFO] Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.425 s - in com.spotify.dbeam.args.QueryBuilderArgsTest
[INFO] Running com.spotify.dbeam.args.QueryBuilderTest
[INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.spotify.dbeam.args.QueryBuilderTest
[INFO] Running com.spotify.dbeam.avro.JdbcAvroRecordTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.642 s - in com.spotify.dbeam.avro.JdbcAvroRecordTest
[INFO] Running com.spotify.dbeam.jobs.BeamHelperTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 s - in com.spotify.dbeam.jobs.BeamHelperTest
[INFO] Running com.spotify.dbeam.jobs.BenchJdbcAvroJobTest
Summary for BenchJdbcAvroJob
Current Settings:
appName: JdbcAvroJob
avroCodec: zstandard1
avroDoc: null
avroSchemaFilePath: null
avroSchemaNamespace: dbeam_generated
blockOnRun: false
connectionUrl: jdbc:h2:mem:test3;MODE=PostgreSQL;DATABASE_TO_UPPER=false;DB_CLOSE_DELAY=-1
defaultEnvironmentConfig: null
defaultEnvironmentType: null
enforceEncodability: true
enforceImmutability: true
executions: 2
experiments: null
exportTimeout: P7D
fetchSize: 10000
gcsPerformanceMetrics: false
jobName: jdbcavrojob-manoh-0421093215-de5b5ebe
limit: null
optionsId: 43
output: C:\Users\manoh\AppData\Local\Temp\jdbc-export-args-test-2ec1dd9a-2718-4204-8546-72d48eed70cb
partition: null
partitionColumn: null
partitionPeriod: null
password: null
passwordFile: null
passwordFileKmsEncrypted: null
preCommand: null
queryParallelism: null
runner: class org.apache.beam.runners.direct.DirectRunner
skipPartitionCheck: true
splitColumn: null
sqlFile: null
stableUniqueNames: WARNING
table: COFFEES
targetParallelism: 1
useAvroLogicalTypes: false
username:

name recordCount writeElapsedMs msPerMillionRows bytesWritten KbWritePerSec
run_00 2 158 79000000 2221 14
run_01 2 0 0 2221 -1
max 2.0 158.0 79000000.0 2221.0 14.0
mean 2.0 79.0 39500000.0 2221.0 6.5
min 2.0 0.0 0.0 2221.0 -1.0
stddev 0.0 79.0 39500000.0 0.0 7.5
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.788 s - in com.spotify.dbeam.jobs.BenchJdbcAvroJobTest
[INFO] Running com.spotify.dbeam.jobs.JdbcAvroJobTest
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.099 s - in com.spotify.dbeam.jobs.JdbcAvroJobTest
[INFO] Running com.spotify.dbeam.jobs.PsqlAvroJobTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.spotify.dbeam.jobs.PsqlAvroJobTest
[INFO] Running com.spotify.dbeam.jobs.PsqlReplicationCheckTest
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.249 s - in com.spotify.dbeam.jobs.PsqlReplicationCheckTest
[INFO] Running com.spotify.dbeam.options.InputAvroSchemaTest
[ERROR] Tests run: 9, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 0.002 s <<< FAILURE! - in com.spotify.dbeam.options.InputAvroSchemaTest
[ERROR] com.spotify.dbeam.options.InputAvroSchemaTest Time elapsed: 0 s <<< ERROR!
java.nio.file.FileSystemException:
C:\Users\manoh\AppData\Local\Temp\dataType1134900135046375457.avsc: The process cannot access the file because it is being used by another process.

    at com.spotify.dbeam.options.InputAvroSchemaTest.afterAll(InputAvroSchemaTest.java:109)

[INFO] Running com.spotify.dbeam.options.JobNameConfigurationTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 s - in com.spotify.dbeam.options.JobNameConfigurationTest
[INFO] Running com.spotify.dbeam.options.PasswordReaderTest
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.023 s <<< FAILURE! - in com.spotify.dbeam.options.PasswordReaderTest
[ERROR] com.spotify.dbeam.options.PasswordReaderTest Time elapsed: 0.016 s <<< ERROR!
java.nio.file.FileSystemException:
C:\Users\manoh\AppData\Local\Temp\pattern4982662895807376452.suffix: The process cannot access the file because it is being used by another process.

    at com.spotify.dbeam.options.PasswordReaderTest.afterAll(PasswordReaderTest.java:53)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] JdbcExportOptionsTest.afterAll:50 ╗ FileSystem C:\Users\manoh\AppData\Local\Te...
[ERROR] InputAvroSchemaTest.afterAll:109 ╗ FileSystem C:\Users\manoh\AppData\Local\Tem...
[ERROR] PasswordReaderTest.afterAll:53 ╗ FileSystem C:\Users\manoh\AppData\Local\Temp...
[INFO]
[ERROR] Tests run: 111, Failures: 0, Errors: 3, Skipped: 2
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for dbeam-parent 0.9.15-SNAPSHOT:
[INFO]
[INFO] dbeam-parent ....................................... SUCCESS [ 3.505 s]
[INFO] DBeam Core ......................................... FAILURE [ 27.551 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 31.247 s
[INFO] Finished at: 2020-04-21T15:02:23+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project dbeam-core: There are test failures.
[ERROR]
[ERROR] Please refer to C:\beam-tutorials\dbeam\dbeam-core\target\surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :dbeam-core

Associate Struct (Composite/Complex types) with Record instead of String

Currently, dbeam translates Structs as Strings

    "name" : "logo",
    "type" : [ "null", "string" ],
    "doc" : "From sqlType 2002 STRUCT",
    "default" : null,
    "typeName" : "STRUCT",
    "sqlCode" : "2002",
    "columnName" : "logo"

Avro supports nested records/structs and information about type is available in PostgreSQL schema as well, so it seems possible in theory to have this association.

So wonder if it's possible to expand dbeam implementation so in generated schema Structs are mapped to Records?

Failed to run on Google cloud dataflow when built with Java 10

I can't figure out how to run DBeam in Google cloud dataflow. I digged SCIO's doc and tried to run DBeam with Google cloud dataflow's runner in sbt shell. I had to add the following line to build.sbt:
"org.apache.beam" % "beam-runners-google-cloud-dataflow-java" % beamVersion,
under libraryDependencies ++= Seq( but still got the errors:

hil@macbook13i72017 ~/c/dbeam> sbt
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from /Users/hil/.sbt/1.0/plugins
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/hil/cbsi/dbeam/project
[info] Loading settings from version.sbt,build.sbt ...
[info] Set current project to dbeam-foss-parent (in build file:/Users/hil/cbsi/dbeam/)
[info] sbt server started at local:///Users/hil/.sbt/1.0/server/b6db3491d7efae758331/sock
sbt:dbeam-foss-parent> project dbeamCore
[info] Set current project to dbeam-core (in build file:/Users/hil/cbsi/dbeam/)
sbt:dbeam-core> runMain com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DataflowRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Running (fork) com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DataflowRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[error] Wed May 23 17:21:17 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] [main] INFO JdbcAvroConversions - Creating Avro schema based on the first read row from the database
[error] [main] INFO JdbcAvroConversions - Schema created successfully. Generated schema: {"type":"record","name":"pet","namespace":"dbeam_generated","doc":"Generate schema from JDBC ResultSet from 'pet' or the --sqlFile with jdbc:mysql://localhost:3306/dbeamtest","fields":[{"name":"name","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"name"},{"name":"owner","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"owner"},{"name":"species","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"species"},{"name":"sex","type":["null","string"],"doc":"From sqlType 1 CHAR","default":null,"typeName":"CHAR","sqlCode":"1","columnName":"sex"},{"name":"birth","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"birth"},{"name":"death","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"death"}],"connectionUrl":"jdbc:mysql://localhost:3306/dbeamtest","tableName":"pet"}
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Elapsed time to schema 0.585 seconds
[error] Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Current ClassLoader is 'jdk.internal.loader.ClassLoaders$AppClassLoader@4b9af9a9' only URLClassLoaders are supported
[error] at scala.Predef$.require(Predef.scala:277)
[error] at com.spotify.scio.runners.dataflow.DataflowContext$.detectClassPathResourcesToStage(DataflowContext.scala:58)
[error] at com.spotify.scio.runners.dataflow.DataflowContext$.getFilesToStage(DataflowContext.scala:49)
[error] at com.spotify.scio.runners.dataflow.DataflowContext$.prepareOptions(DataflowContext.scala:39)
[error] at com.spotify.scio.RunnerContext$.prepareOptions(ScioContext.scala:104)
[error] at com.spotify.scio.ScioContext.pipeline(ScioContext.scala:287)
[error] at com.spotify.scio.ScioContext$$anonfun$parallelize$1.apply(ScioContext.scala:857)
[error] at com.spotify.scio.ScioContext$$anonfun$parallelize$1.apply(ScioContext.scala:856)
[error] at com.spotify.scio.ScioContext.requireNotClosed(ScioContext.scala:419)
[error] at com.spotify.scio.ScioContext.parallelize(ScioContext.scala:856)
[error] at com.spotify.dbeam.JdbcAvroJob$.createSchema(JdbcAvroJob.scala:63)
[error] at com.spotify.dbeam.JdbcAvroJob$.prepareExport(JdbcAvroJob.scala:131)
[error] at com.spotify.dbeam.JdbcAvroJob$.runExport(JdbcAvroJob.scala:151)
[error] at com.spotify.dbeam.JdbcAvroJob$.main(JdbcAvroJob.scala:160)
[error] at com.spotify.dbeam.JdbcAvroJob.main(JdbcAvroJob.scala)
[error] java.lang.RuntimeException: Nonzero exit code returned from runner: 1
[error] at sbt.ForkRun.processExitCode$1(Run.scala:33)
[error] at sbt.ForkRun.run(Run.scala:42)
[error] at sbt.Defaults$.$anonfun$bgRunMainTask$6(Defaults.scala:1147)
[error] at sbt.Defaults$.$anonfun$bgRunMainTask$6$adapted(Defaults.scala:1142)
[error] at sbt.internal.BackgroundThreadPool.$anonfun$run$1(DefaultBackgroundJobService.scala:366)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
[error] at scala.util.Try$.apply(Try.scala:209)
[error] at sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:289)
[error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
[error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[error] at java.base/java.lang.Thread.run(Thread.java:844)
[error] (Compile / runMain) Nonzero exit code returned from runner: 1
[error] Total time: 7 s, completed May 23, 2018, 5:21:18 PM

The error was Current ClassLoader is 'jdk.internal.loader.ClassLoaders$AppClassLoader@4b9af9a9' only URLClassLoaders are supported

Changing --runner to DirectRunner succeeded:

sbt:dbeam-core> runMain com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DirectRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Running (fork) com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DirectRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[error] Wed May 23 17:28:20 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] [main] INFO JdbcAvroConversions - Creating Avro schema based on the first read row from the database
[error] [main] INFO JdbcAvroConversions - Schema created successfully. Generated schema: {"type":"record","name":"pet","namespace":"dbeam_generated","doc":"Generate schema from JDBC ResultSet from 'pet' or the --sqlFile with jdbc:mysql://localhost:3306/dbeamtest","fields":[{"name":"name","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"name"},{"name":"owner","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"owner"},{"name":"species","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"species"},{"name":"sex","type":["null","string"],"doc":"From sqlType 1 CHAR","default":null,"typeName":"CHAR","sqlCode":"1","columnName":"sex"},{"name":"birth","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"birth"},{"name":"death","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"death"}],"connectionUrl":"jdbc:mysql://localhost:3306/dbeamtest","tableName":"pet"}
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Elapsed time to schema 0.726 seconds
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Running queries: List(SELECT * FROM pet)
[error] WARNING: An illegal reflective access operation has occurred
[error] WARNING: Illegal reflective access by org.apache.beam.runners.direct.repackaged.com.google.protobuf.UnsafeUtil (file:/private/var/folders/q3/rf49by096192ckdl4j7dt56w0000gn/T/sbt_d9330199/target/b187293c/beam-runners-direct-java-2.4.0.jar) to field java.nio.Buffer.address
[error] WARNING: Please consider reporting this to the maintainers of org.apache.beam.runners.direct.repackaged.com.google.protobuf.UnsafeUtil
[error] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[error] WARNING: All illegal access operations will be denied in a future release
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.WriteFiles - Opening writer 191b3dde-2394-4c6b-a022-5bab0f73dd00 for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@fe7b6b0 pane PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0} destination null
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Preparing write...
[error] Wed May 23 17:28:23 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] Wed May 23 17:28:23 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Write prepared
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Starting write...
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Executing query (this can take a few minutes) ...
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Execute query took 0.01 seconds
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Read 1 rows, took 0.01 seconds
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Closing connection, flushing writer...
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Write finished
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink$Writer - Successfully wrote temporary file gs://dbeam-test/tmp/.temp-beam-2018-05-24_00-28-22-1/191b3dde-2394-4c6b-a022-5bab0f73dd00
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.WriteFiles - Finalizing 1 file results
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink - Finalizing for destination null num shards 1.
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink - Will copy temporary file FileResult{tempFilename=gs://dbeam-test/tmp/.temp-beam-2018-05-24_00-28-22-1/191b3dde-2394-4c6b-a022-5bab0f73dd00, shard=0, window=org.apache.beam.sdk.transforms.windowing.GlobalWindow@fe7b6b0, paneInfo=PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0}} to final location gs://dbeam-test/tmp/part-00000-of-00001.avro
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink - Will remove known temporary file gs://dbeam-test/tmp/.temp-beam-2018-05-24_00-28-22-1/191b3dde-2394-4c6b-a022-5bab0f73dd00
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Metrics Metrics(0.5.4,2.12.4,JdbcAvroJob,DONE,BeamMetrics(List(BeamMetric(com.spotify.scio.ScioMetrics,schemaElapsedTimeMs,MetricValue(726,Some(726))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,writeElapsedMs,MetricValue(7,Some(7))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,recordCount,MetricValue(1,Some(1))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,executeQueryElapsedMs,MetricValue(9,Some(9)))),List(),List(BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,msPerMillionRows,MetricValue(BeamGauge(7000000,2018-05-24T00:28:23.895Z),Some(BeamGauge(7000000,2018-05-24T00:28:23.895Z)))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,rowsPerMinute,MetricValue(BeamGauge(8571,2018-05-24T00:28:23.895Z),Some(BeamGauge(8571,2018-05-24T00:28:23.895Z)))))))
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - all counters and gauges Map(MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=rowsPerMinute} -> MetricValue(GaugeResult{value=8571, timestamp=2018-05-24T00:28:23.895Z},Some(GaugeResult{value=8571, timestamp=2018-05-24T00:28:23.895Z})), MetricName{namespace=com.spotify.scio.ScioMetrics, name=schemaElapsedTimeMs} -> MetricValue(726,Some(726)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=recordCount} -> MetricValue(1,Some(1)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=executeQueryElapsedMs} -> MetricValue(9,Some(9)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=writeElapsedMs} -> MetricValue(7,Some(7)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=msPerMillionRows} -> MetricValue(GaugeResult{value=7000000, timestamp=2018-05-24T00:28:23.895Z},Some(GaugeResult{value=7000000, timestamp=2018-05-24T00:28:23.895Z})))
[success] Total time: 12 s, completed May 23, 2018, 5:28:27 PM
sbt:dbeam-core>

Any support to write Avro back into MySQL or PostGres?

Hello,

Would any of you know how we can load the avro files into a table in MySQL or Postgres? I see it's not probably supported here, but any chance that you'd know of another repo that would support that?

Thanks

Add support for executing pre commands

Add a command line argument, like --preCommand="SET worker_mem='4GB';" --preCommand="SET random_page_cost=1.1;", where the session can be configured before executing the extraction query.

We can name the argument/option as --preCommand, --preExecute or something else..

Support for Java 11

Is there any partucular reason why dbeam releases library as Java 8 artifact?

dbeam/pom.xml

Line 101 in 1668c7d

<maven.compiler.release>8</maven.compiler.release>

<maven.compiler.release>8</maven.compiler.release>

Especially given thefact that dbeam requires at least JDK 11 for compilation

dbeam/pom.xml

Lines 532 to 534 in 1668c7d

<requireJavaVersion>
<version>[11,)</version>
</requireJavaVersion>

                  <requireJavaVersion>
                    <version>[11,)</version>
                  </requireJavaVersion>

Can't generate a self-executable fat JAR

Hello. I'm trying to generate a self executable fat JAR in order to use DBeam with Apache Airflow (using DataflowOperator, which requires it in such format). I've added sbt assembly plugin and tried to use many different merge strategies, but although the fat JAR is built, it doesn't contain a main class specified in the MANIFEST file.

I added the following to project/plugins.sbt:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.7")

And also this to build.sbt:
mainClass in assembly := Some("com.spotify.dbeam.JdbcAvroJob"),

When running with java -jar I get:
no main manifest attribute, in dbeam-core/target/scala-2.12/dbeam-core_2.12-0.4.1-SNAPSHOT.jar
The same with the "pack" JAR.

Disclaimer: I'm not a Scala/SBT expert :-)

allow output to be a bigquery table

It would be great if dbeam allows direct streaming into BQ. Current approach requires second job to accomplish this, loading from gcs to bq.

readme needs improvement

  1. The Scio broke in the Readme content
  2. The Examples can't run easily. The following command after I changed CLASS_PATH dbeam-core_2.12.jar to dbeam/target/scala-2.12/*.jar, I got error Error: Unable to initialize main class com.spotify.dbeam.JdbcAvroJob
    Caused by: java.lang.NoClassDefFoundError: org/slf4j/Logger
java -cp CLASS_PATH dbeam-core_2.12.jar com.spotify.dbeam.JdbcAvroJob \
  --output=gs://my-testing-bucket-name/ \
  --username=my_database_username \
  --password=secret \
  --connectionUrl=jdbc:postgresql://some.database.uri.example.org:5432/my_database \
  --table=my_table

Can you provide more concrete examples? I am trying to figure out how to run from reading ./dbeam-pack/target/pack/bin/jdbc-avro-job. An example of how to run it in Google cloud dataflow with mvn compile exec:java would be nice.

  1. Instead of running ./dbeam-pack/target/pack/bin/jdbc-avro-job, how about a full example?
bash -v ./dbeam-pack/target/pack/bin/jdbc-avro-job --connectionUrl=jdbc:postgresql://some.database.uri.example.org:5432/my_database --table=my_table --output=gs://my-testing-bucket-name/  --username=my_database_username --password=secret

Support for "ceiling" filtering/partitioning

We do a DB table dump daily and would like to include partition date's data or earlier (but not later), so DB dumps are actually reproducible/deterministic.

So, is it possibe to achieve configuration like below?
Parameters: "--table=some_table --partition=2027-07-31 --partitionColumn=col" + some other option (?)
=>
SQL: "SELECT * FROM some_table WHERE 1=1 AND col < '2027-08-01'"),

P.S. It is probably possible to achive this using user-provided SQL file and add some parsing of partition value,
but would much more simple to employ dedicated parameter(s).

NoClassDefFoundError: com/google/api/client/json/gson/GsonFactory

Hi there! Forgive me if this is a simple issue, but I'm not familiar with Java/JVM. I built the library and run it how you described here: https://github.com/spotify/dbeam#usage-examples.

dbeam managed to create the Avro schema, but failed at the next step. It looks like we're missing a class or library? Would you know how to add it to the project, perhaps via an additional command line argument or installing via maven?

Thanks!

[main] INFO com.spotify.dbeam.avro.BeamJdbcAvroSchema - Elapsed time to schema 0.208 seconds
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/api/client/json/gson/GsonFactory
        at com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:92)
        at org.apache.beam.sdk.extensions.gcp.auth.GcpCredentialFactory.getCredential(GcpCredentialFactory.java:63)
        at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpUserCredentialsFactory.create(GcpOptions.java:268)
        at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpUserCredentialsFactory.create(GcpOptions.java:257)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:605)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:546)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:171)
        at com.sun.proxy.$Proxy22.getGcpCredential(Unknown Source)
        at org.apache.beam.sdk.extensions.gcp.util.Transport.newStorageClient(Transport.java:98)
        at org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:104)
        at org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:93)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:605)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:546)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:171)
        at com.sun.proxy.$Proxy22.getGcsUtil(Unknown Source)
        at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:135)
        at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:61)
        at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:244)
        at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:231)
        at com.spotify.dbeam.beam.BeamHelper.writeToFile(BeamHelper.java:80)
        at com.spotify.dbeam.beam.BeamHelper.saveStringOnSubPath(BeamHelper.java:88)
        at com.spotify.dbeam.jobs.JdbcAvroJob.prepareExport(JdbcAvroJob.java:119)
        at com.spotify.dbeam.jobs.JdbcAvroJob.runExport(JdbcAvroJob.java:164)
        at com.spotify.dbeam.jobs.JdbcAvroJob.main(JdbcAvroJob.java:175)
Caused by: java.lang.ClassNotFoundException: com.google.api.client.json.gson.GsonFactory
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 24 more

Add support for MS SQL Server

Currently this DB is not supported as a source, but is widely used. Would be great to have the option of reading from MS SQL Server tables as well.

Writing output into AWS S3 does not work

I understand dbeam officially does not support S3. I kinda of hopping it might work out of the box because Beam supports S3 as its file io. However I got the following error:

"No filesystem found for scheme s3"

Do you have a clue why is this happening? Do you plan to add S3 output in the future?

--sqlFile not working with PostgreSQL databases

When using the command line option --sqlFile the generated query is not compatible with PostgreSQL.

It fails with:

ERROR: subquery in FROM must have an alias
  Hint: For example, FROM (SELECT ...) [AS] foo.
  Position: 15
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2468)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2211)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:309)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:311)
	at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:297)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:274)
	at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:225)
	at com.spotify.dbeam.avro.JdbcAvroSchema.createSchemaByReadingOneRow(JdbcAvroSchema.java:83)
	at com.spotify.dbeam.avro.BeamJdbcAvroSchema.createSchema(BeamJdbcAvroSchema.java:55)
	at com.spotify.dbeam.jobs.JdbcAvroJob.prepareExport(JdbcAvroJob.java:101)
	at com.spotify.dbeam.jobs.JdbcAvroJob.runExport(JdbcAvroJob.java:145)
	at com.spotify.

The problem is that the template used by the query builder is not setting an alias for the subquery:

    public String getBaseSql() {
      return String.format("%s FROM (%s) %s",
              selectClause, userSqlQuery, DEFAULT_WHERE_CLAUSE);
    }

instead it should be something like:

    public String getBaseSql() {
      return String.format("%s FROM (%s) as user_sql_query %s",
              selectClause, userSqlQuery, DEFAULT_WHERE_CLAUSE);
    }

I'll submit a PR.

BTW, great project!

Provide descriptions for fields

Hello. I have a requirement to have descriptions in the fields of the tables on BigQuery. When loading an Avro export from DBeam into BQ, the descriptions are set according to a hard-coded sentence DBeam sets, such as:

From sqlType 4 INTEGER

I need a way to set the descriptions myself. Could you please let me know the best way to accomplish this in your option on DBeam? I can prepare a PR. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.