Git Product home page Git Product logo

schedule-python-script-using-google-cloud's Introduction

schedule-python-script-using-Google-Cloud

Use Case: Automates live Chicago traffic data and flows it into BigQuery for interactive real-time analysis

Technical Concept: Schedules a simple Python script to append data into BigQuery using Google Cloud's App Engine with a cron job.

Source Data: https://data.cityofchicago.org/Transportation/Chicago-Traffic-Tracker-Congestion-Estimates-by-Se/n4j6-wkkf

Architecture Reference: http://zablo.net/blog/post/python-apache-beam-google-dataflow-cron

Shout out to Mylin Ackerman for all his help. Saved me weeks of research with his personal touch. https://www.linkedin.com/in/mylin-ackermann-25a00445/

Check me out on LinkedIn: https://www.linkedin.com/in/sungwonchung1/

Setup Prerequisites:

  1. Signup for Google Cloud account and enable billing
  2. Enable BigQuery API, Stackdriver API, Google Cloud Deployment Manager V2 API, Google Compute Engine API

Order of Operations:

  1. Develop scripts with Google cloud shell or SDK
  2. Deploy on appengine
  3. Deploy cron job
  4. Check BigQuery
  5. Connect with dataviz tool such as Tableau

Development Instructions:

  1. Copy github repository into SDK or Google cloud shell(thankfully it has persistent storage, so you don't have to recopy the folder structure): git clone https://github.com/sungchun12/schedule-python-script-using-Google-Cloud.git
  2. Create BigQuery dataset: "chicago_traffic"

Deploy Instructions:

  1. Remember to put init.py files into all local packages
  2. Change directory: cd ~/chicago-traffic
  3. Install all required packages into local lib folder: pip install -r requirements.txt -t lib
  4. To deploy App Engine app, run: gcloud app deploy app.yaml
  5. To deploy App Engine CRON, run: gcloud app deploy cron.yaml

Folder Structure:

alt text

init.py needed to properly deploy within App Engine

append_data.py - call the Chicago live traffic API and appends it into BigQuery

app.yaml - definition of Google App Engine application

appengine_config.py adds dependencies to locally installed packages (from lib folder)

cron.yaml - definition of Google App Engine CRON job

main.py - entry point for the web application and calls the function contained within "append_data.py"

requirements.txt - file for pip package manager, which contains list of all required packages to run the application and the pipeline

lib - local folder with all pip-installed packages from requirements.txt file

schedule-python-script-using-google-cloud's People

Contributors

dependabot[bot] avatar sungchun12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

schedule-python-script-using-google-cloud's Issues

Keep getting asked for credentials in CLI?

Heya,

This was just what I was looking for, but it seems I'm having major difficulties getting it to actually run.

Done exactly what you've written, tried with both API-key and without. Keep getting this error:

File "/env/local/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 194, in get_credentials credentials = self.get_user_account_credentials() File "/env/local/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 370, in get_user_account_credentials credentials = app_flow.run_console() File "/env/local/lib/python2.7/site-packages/google_auth_oauthlib/flow.py", line 362, in run_console code = input(authorization_code_message) EOFError: EOF when reading a line

Do you have any idea why?

issues with requirements.txt file

Hi!,

Trying to replicate your steps but keep getting the following errors. I updated the requirements text with the following:

pandas==0.3.0
sodapy==1.4.6
datalab==1.1.4
gunicorn==19.7.1
six==1.10
Flask==0.12.2
pandas-gbq==0.5.0

and I got the following error:

tensorflow 1.10.0 has requirement numpy<=1.14.5,>=1.13.3, but you'll have numpy 1.15.0 which is incompatible.
tensorflow 1.10.0 has requirement setuptools<=39.1.0, but you'll have setuptools 40.1.0 which is incompatible.
requests 2.18.4 has requirement urllib3<1.23,>=1.21.1, but you'll have urllib3 1.23 which is incompatible.
datalab 1.1.4 has requirement pandas>=0.22.0, but you'll have pandas 0.3.0 which is incompatible.
pandas-profiling 1.4.1 has requirement pandas>=0.19, but you'll have pandas 0.3.0 which is incompatible.
seaborn 0.9.0 has requirement pandas>=0.15.2, but you'll have pandas 0.3.0 which is incompatible.

Let me know if I messed up something in the requirements table. New on this, so I apologize in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.