Git Product home page Git Product logo

sql-runner's Introduction

SQL Runner

Build Status Coveralls Go Report Card Release License

Overview

Run playbooks of SQL scripts in series and parallel on Snowflake DB, Amazon Redshift and PostgreSQL.

Used with Snowplow for scheduled SQL-based transformations of event stream data.

Setup Guide User Guide
i1 i2

Quick start

Assuming go, docker and docker-compose are installed:

 host> git clone https://github.com/snowplow/sql-runner
 host> cd sql-runner
 host> make setup-up    # Launches Consul + Postgres for testing
 host> make             # Builds sql-runner binaries
 host> make test        # Runs unit tests

 # DISTRO specifies which binary you want to run integration tests with
 host> DISTRO=darwin make integration

Note: You will need to ensure that ~/go/bin is on your PATH for gox to work - the underlying tool that we use for building the binaries.

When you are done with testing simply execute make setup-down to terminate the docker-compose stack.

To reset the testing resources execute make setup-reset which will rebuild the docker containers. This can be useful if the state of these systems gets out of sync with what the tests expect.

To remove all build files:

guest> make clean

To format the golang code in the source directory:

guest> make format

Note: Always run make format before submitting any code.

Note: The make test command also generates a code coverage file which can be found at build/coverage/coverage.html.

How to use?

First either compile the binary from source using the above make command or download the published Binary directly from the GitHub release:

CLI Output

sql-runner version: 0.10.1
Run playbooks of SQL scripts in series and parallel on Redshift and Postgres
Usage:
  -checkLock string
    	Checks whether the lockfile already exists
  -consul string
    	The address of a consul server with playbooks and SQL files stored in KV pairs
  -consulOnlyForLock
    	Will read playbooks locally, but use Consul for locking.
  -deleteLock string
    	Will attempt to delete a lockfile if it exists
  -dryRun
    	Runs through a playbook without executing any of the SQL
  -fillTemplates
    	Will print all queries after templates are filled
  -fromStep string
    	Starts from a given step defined in your playbook
  -help
    	Shows this message
  -lock string
    	Optional argument which checks and sets a lockfile to ensure this run is a singleton. Deletes lock on run completing successfully
  -playbook string
    	Playbook of SQL scripts to execute
  -runQuery string
    	Will run a single query in the playbook
  -showQueryOutput
    	Will print all output from queries
  -softLock string
    	Optional argument, like '-lock' but the lockfile will be deleted even if the run fails
  -sqlroot string
    	Absolute path to SQL scripts. Use PLAYBOOK, BINARY and PLAYBOOK_CHILD for those respective paths (default "PLAYBOOK")
  -var value
    	Variables to be passed to the playbook, in the key=value format
  -version
    	Shows the program version

Copyright and license

SQL Runner is copyright 2015-2022 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

sql-runner's People

Contributors

adatzer avatar alexanderdean avatar andrioni avatar colmsnowplow avatar dannymc129 avatar dennisatspaceape avatar jbeemster avatar laulaman avatar lritter avatar mbobrovskyi avatar mhadam avatar rzats avatar tclass avatar tiganetearobert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql-runner's Issues

Fix "panic: assignment to entry in nil map"

Triggered when there is an issue with the supplied variables:

 panic: assignment to entry in nil map goroutine 1 [running]: github.com/snowplow/sql-runner/playbook.MergeCLIVariables(0xc8200e2400, 0x1, 0x1, 0x0, 0xc820118a00, 0xe, 0x10, 0xc8200dea20, 0x0, 0x0, ...) /opt/gopath/src/github.com/snowplow/sql-runner/playbook/playbook.go:49 +0x174 github.com/snowplow/sql-runner/playbook.ParsePlaybook(0x7fff43f6388f, 0x50, 0xc8200dea20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /opt/gopath/src/github.com/snowplow/sql-runner/playbook/playbook.go:42 +0x139 main.main() /opt/gopath/src/github.com/snowplow/sql-runner/main.go:38 +0x73 goroutine 17 [syscall, locked to thread]: runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 goroutine 5 [chan receive]: github.com/golang/glog.(*loggingT).flushDaemon(0xa83860) /opt/gopath/src/github.com/snowplow/sql-runner/Godeps/_workspace/src/github.com/golang/glog/glog.go:879 +0x67 created by github.com/golang/glog.init.1 /opt/gopath/src/github.com/snowplow/sql-runner/Godeps/_workspace/src/github.com/golang/glog/glog.go:410 +0x297

Move SQL file reading to startup

At the moment, SQL files are read "just-in-time" - i.e. when they are needed to run a query.

We should move to "eager" reading of SQL files on startup. That way, if any of the SQL files are missing, we will know at startup. Also, it protects us from e.g. deployment updates to the SQL files during execution of a playbook.

/cc @yalisassoon

sql-runner fails on IF EXISTS statements

For example, web-incremental step 00-preparation:
https://github.com/snowplow/snowplow/blob/master/5-data-modeling/sql-runner/redshift/sql/web-incremental/00-preparation/00-preparation.sql#L25

Gives this output

2015/10/07 04:12:30 EXECUTING 00-preparation (in step 00-preparation @ snowplow): /opt/snowplow/sql_runner/analytics/sql/web-incremental/00-preparation/00-preparation.sql
2015/10/07 04:12:30 FAILURE: 00-preparation (step 00-preparation @ target snowplow), ERROR: ERROR #3F000 schema "snplw_temp" does not exist: 
2015/10/07 04:12:30 
TARGET INITIALIZATION FAILURES:
QUERY FAILURES:
* Query 00-preparation /opt/snowplow/sql_runner/analytics/sql/web-incremental/00-preparation/00-preparation.sql (in step 00-preparation @ target snowplow), ERROR:
  - ERROR #3F000 schema "snplw_temp" does not exist: 

Can sql-runner be made to handle this edge case. It would also allow one to create playbooks for all initial schema and table setup.

Upgrade vagrant push to upload OS X and Windows binaries

Do you think it'd possible to push OS X binaries, either to bintray or through the GitHub release mechanism? While I do think most users are going to deploy sql-runner on Linux, it's nice to be able to test on OS X without using Vagrant.

Explore idempotent runs

From @yalisassoon:

I'm wondering if there's a way to make sql-runner idempotent? It would be nice if e.g. at each query step a flag was set, so that you could just rerun the job and it would automatically skip any steps that had already been completed?

We would need to come up with a way to:

  1. Tracker progress through a run (ZooKeeper?)
  2. Identify different executions of sql-runner as being repeated attempts to perform the same run

Issue with `godep go build` straight out of the box

I don't really have a lot of experience with go, so I might be doing something dumb, but godep go build returns the following error (while building on OS X):

main.go:18:2: cannot find package "github.com/snowplow/sql-runner/playbook" in any of:
    /usr/local/Cellar/go/1.4.2/libexec/src/github.com/snowplow/sql-runner/playbook (from $GOROOT)
    /Users/jakedust/workspace/pkg/sql-runner/Godeps/_workspace/src/github.com/snowplow/sql-runner/playbook (from $GOPATH)
    /Users/jakedust/.go/src/github.com/snowplow/sql-runner/playbook
main.go:19:2: cannot find package "github.com/snowplow/sql-runner/run" in any of:
    /usr/local/Cellar/go/1.4.2/libexec/src/github.com/snowplow/sql-runner/run (from $GOROOT)
    /Users/jakedust/workspace/pkg/sql-runner/Godeps/_workspace/src/github.com/snowplow/sql-runner/run (from $GOPATH)
    /Users/jakedust/.go/src/github.com/snowplow/sql-runner/run
godep: go exit status 1

After a go get github.com/snowplow/sql-runner, the build works fine.

Error on `vagrant up`

==> default: =======================================
==> default: CLONING ANSIBLE AND PLAYBOOKS WITH PERU
==> default: ---------------------------------------
==> default: Traceback (most recent call last):
==> default:   File "/usr/local/bin/peru", line 9, in <module>
==> default:     load_entry_point('peru==0.2.4', 'console_scripts', 'peru')()
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/main.py", line 314, in main
==> default:     scope, imports = parser.parse_file(runtime.peru_file)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 22, in parse_file
==> default:     return parse_string(f.read(), name_prefix)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 32, in parse_string
==> default:     return _parse_toplevel(blob, name_prefix)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 36, in _parse_toplevel
==> default:     modules = _extract_modules(blob, name_prefix)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 100, in _extract_modules
==> default:     yaml_name)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 107, in _build_module
==> default:     default_rule = _extract_default_rule(blob)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 84, in _extract_default_rule
==> default:     return _extract_rule("<default>", blob)
==> default:   File "/usr/local/lib/python3.4/dist-packages/peru/parser.py", line 68, in _extract_rule
==> default:     untar/unzip a curl module, use the "unpack" field.'''))
==> default: peru.parser.ParserError: The "build" field is no longer supported. If you need to
==> default: untar/unzip a curl module, use the "unpack" field.

I'm using Vagrant 1.7.2, if it helps.

Optional steps and/or queries

The ability to skip some steps and/or queries sometimes would be nice.

My current use case: I'm using sql-runner to load data into Redshift using a dynamically generated manifest file, and sometimes the manifest file ends up empty. Whenever that happens, COPY just errors out.

Another option would be to say certain error codes are acceptable.

(this also might move sql-runner dangerously close to being a full-fledged workflow scheduler, heh)

Related to #32.

Add an environment variable function

It'd be nice to be able to support environment variables directly: it helps with the testing and reduces the chances of someone committing credentials :)

Create wiki page for sql-runner

The application is getting more and more features and it would be good to document them in a better place than the README.

Add -dryRun mode

It would be nice to be able to run through a whole playbook except for the actual SQL commands. This helps to validate that all SQL scripts can be found and all variables replaced.

Add ability to set a lockfile for a run

We often run SQL Runner inside a shell script which sets/checks a "run lock" to prevent the SQL Runner from starting if there was a problem completing the previous playbook run.

It would be nice to add this as a first class feature to SQL Runner, setting/checking the lock in Consul.

Proposal: two new CLI args:

* `-lock`     : Optional argument which checks and sets a lockfile to ensure this run is a singleton. Deletes lock on run completing successfully
* `-softLock` : Optional argument, like `-lock` but the lockfile will be deleted even if the run fails

#### More on Consul

Using the `-consul` argument results in the following changes:

* ...
* The `-lock` and `-softLock` arguments also become a key argument for Consul

If you pass in the default:

    ./sql-runner -consul "localhost:8500" -playbook "sql-runner/playbook/1" -lock "sql-runner/locks/1"

This results in:

* ...
* Checking and setting the lockfile at this key "sql-runner/locks/1"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.