aws-batch-serverless-pipeline's Introduction

AWS Batch and AWS Step Functions

Build Status

This is pre-alpha code, completely untested.

This architecture decouples the batch engine and workflow orchestration.

Workflow creation is defined in JSON and can also integrate with non-AWS Batch applications
Array jobs in AWS Batch support the capability to fan-out individual steps along with support for job dependency models.
AWS Batch supports dynamic compute provisioning and scaling, so we could provision 10 jobs or 10,000 jobs without paying for idle time for the resources.

Data Sharing: Jobs are managed at the container, not instance level, so cannot guarantee containers in a workflow will run on the same instance. Stage all data in Amazon S3, and read and write everything from there. This is also important for traceability, logging, and debugging.
Multitenancy: Maybe have multiple containers running batch processes on the same instance in same base working directory. Within scratch directory, each batch process creates a subfolder with a unique ID. All scratch data written to this subdirectory.
Volume Reuse: Scratch data should live only as long as the job using it in order to optimize for instance and Amazon EBS storage costs. Within scratch directory, each batch process creates a subfolder with a unique ID. Delete subdirectory at end of job.

Example environmental variable from trigger lamdba -> step function -> AWS Batch job
Container running in ECS (managed by AWS Batch) needs an IAM role to call other AWS services
A first test run

Recommend Projects