Git Product home page Git Product logo

aws-step-functions-data-science-sdk-r's Introduction

aws-step-functions-data-science-sdk-r

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS. This package utilises paws to make a connection to AWS.

Install:

Cran Version

# TBC

Dev Version

remotes::install_github("DyfanJones/aws-step-functions-data-science-sdk-r")

Building a Workflow

Note this example is taken from: https://github.com/aws/aws-step-functions-data-science-sdk-python

Steps

You create steps using the SDK, and chain them together into sequential workflows. Then, you can create those workflows in AWS Step Functions and execute them in Step Functions directly from your R code. For example, the following is how you define a pass step.

library(stepfunctions)
start_pass_state = Pass$new(
    state_id="MyPassState"
)

The following is how you define a wait step.

wait_state = Wait$new(
    state_id="Wait for 3 seconds",
    seconds=3
)

The following example shows how to define a Lambda step, and then defines a Retry and a Catch.

lambda_state = LambdaStep$new(
  state_id="Convert HelloWorld to Base64",
  parameters=list(
    "FunctionName"="MyLambda", #replace with the name of your function
    "Payload"=list(
      "input"="HelloWorld")
  )
)
lambda_state$add_retry(Retry$new(
  error_equals="States.TaskFailed",
  interval_seconds=15,
  max_attempts=2,
  backoff_rate=4.0
))
lambda_state$add_catch(Catch$new(
  error_equals="States.TaskFailed",
  next_step=Fail$new("LambdaTaskFailed")
))

Workflows

After you define these steps, chain them together into a logical sequence.

workflow_definition=Chain$new(c(start_pass_state, wait_state, lambda_state))

Once the steps are chained together, you can define the workflow definition.

# change execution role to your execution role
stepfunctions_execution_role="dummy-role"
workflow = Workflow$new(
  name="MyWorkflow_v1234",
  definition=workflow_definition,
  role=stepfunctions_execution_role
)

Visualizing a Workflow

The following generates a graphical representation of your workflow. Please note that visualization currently only works in Jupyter notebooks. Visualization is not available in JupyterLab or RStudio.

workflow$render_graph()

Review a Workflow Definition

The following renders the JSON of the Amazon States Language definition of the workflow you created.

workflow$definition$to_json(pretty=TRUE)
{
  "StartAt": "MyPassState",
  "States": {
    "MyPassState": {
      "Type": "Pass",
      "Next": "Wait for 3 seconds"
    },
    "Wait for 3 seconds": {
      "Seconds": 3,
      "Type": "Wait",
      "Next": "Convert HelloWorld to Base64"
    },
    "Convert HelloWorld to Base64": {
      "Parameters": {
        "FunctionName": "MyLambda",
        "Payload": {
          "input": "HelloWorld"
        }
      },
      "Resource": "arn:aws:states:::lambda:invoke",
      "Type": "Task",
      "End": true,
      "Retry": [
        {
          "Error_equals": [
            "States.TaskFailed"
          ],
          "Interval_seconds": 15,
          "Max_attempts": 2,
          "Backoff_rate": 4
        }
      ],
      "Catch": [
        {
          "Error_equals": "States.TaskFailed",
          "Next": "LambdaTaskFailed"
        }
      ]
    },
    "LambdaTaskFailed": {
      "Type": "Fail"
    }
  }
}

For more examples please check out examples

aws-step-functions-data-science-sdk-r's People

Contributors

dyfanjones avatar

Stargazers

 avatar

Watchers

 avatar

aws-step-functions-data-science-sdk-r's Issues

Support all services

https://docs.aws.amazon.com/step-functions/latest/dg/concepts-service-integrations.html

Service Request Response Run a Job (.sync) Wait for Callback (.waitForTaskToken)
Lambda  
AWS Batch  
DynamoDB    
Amazon ECS/AWS Fargate
Amazon SNS  
Amazon SQS  
AWS Glue  
SageMaker  
Amazon EMR  
Amazon EMR on EKS  
CodeBuild  
Athena  
Amazon EKS  
API Gateway  
AWS Glue DataBrew  
Amazon EventBridge  
AWS Step Functions

Unit tests

Create unit tests to test how stable to package currently is

Decouple R6sagemaker

Currently R6sagemaker is a "soft" dependency as it is only required for sagemaker classes. However if a users is using python sagemaker through reticulate then stepfunctions should be able to support it.

Goal to support R6sagemaker and python sagemaker through reticulate

Render graphs in RStudio

Currently stepfunctions creates html script and then renders it in jupyter notebook using IRdisplay::display_html. For RStudio to render the html either: htmlwidget or rebuilding the flow using igraph.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.