Git Product home page Git Product logo

dask-era5's Introduction

Working with ERA5 using Dask and AWS Fargate

This example uses AWS CloudFormation to create an Amazon SageMaker Jupyter Notebook and AWS Fargate cluster for using Dask for distributed computation over large data volumes.

The Jupyter notebook shows an example of how to use Dask to load netcdf files directly from S3. The mean and standard deviation of the loaded data are then computed to demonstrate how Dask can be used to accelerate computations over large data volumes. Finally, time series are pulled from the loaded data to demonstrate how to select specific locations in a raster field.

Getting started

cloudformation-launch-stack

  1. Launch the stack, by default it will be in the us-east-1 region (since that's where the ERA5 data is) but you can change it to any region you prefer.
  2. On the Parameters page, enter your DaskWorkerGitToken which is a GitHub OAuth Token. See below for how to get one if you don't have it. You can leave all the other parameters alone for now.
  3. Hit next twice, agree that you know this will create IAM resources.
  4. Wait for the stack to create, and then navigate to the Outputs tab for the link to your Jupyter Notebook.

Github OAuth Token

The AWS services require a GitHub OAuth token to be able to build the Docker container image for the Dask worker & scheduler nodes. To generate the token go to https://github.com/settings/tokens. It is enough for the token to only have public_repo permissions.

Architecture

architecture

dask-era5's People

Contributors

zflamig avatar rsignell-usgs avatar

Stargazers

 avatar Lucas Kruitwagen avatar Ales Kuchar avatar Scott Henderson avatar

Watchers

James Cloos avatar  avatar

dask-era5's Issues

Cannot connect to scheduler in era5_fargate_dask example

Hi! I've been trying to setup a Fargate cluster with your cloud formation.

Currently experiencing some difficulties connecting to the scheduler.

So, the dask scheduler port specified in the cloudformation is in fact 8786 and it is the same in the code block:

client = Client('Dask-Scheduler.local-dask:8786')
client

Have you experienced issues in the past @zflamig? Strung out on debugging options for the moment.

Please let me know if you have any ideas.

Thanks!

era5 CodeBuild pipeline failed due to dockerhub pull rate limit on miniconda

@zflamig , the CodeBuild pipeline failed due to dockerhub pull rate limit on miniconda

Step 1/9 : FROM continuumio/miniconda3:4.7.12
347 | toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
348

@ocefpaf, do you know if there is a different place to pull the Miniconda Container from now?

Stack failed on CodeBuildLogs

Resource of type 'AWS::Logs::LogGroup' with identifier '{"/properties/LogGroupName":"/aws/codebuild/era5"}' already exists.

2020-11-16_9-26-19

Pattern matching in Stack deployment does not accept underscores

I tried generating a GitHub access token to try deploying the stack. However, I cannot add it to the deployment without having an error:

Parameter 'DaskWorkerGitToken' must match pattern [a-zA-Z0-9]+

Is the pattern matching something custom that worked before but not anymore with the tokens? Is there some other way to generate a token matching the specifications?

Let me know if you need any more information.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.