Git Product home page Git Product logo

aws-samples / automated-image-augmentation-pipeline Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 1.0 49 KB

This repository provides a deployable solution using Infrastructure-as-Code (IaC) templates to help you set up a serverless image augmentation pipeline to generate additional training data from existing images for machine learning use-cases.

License: MIT No Attribution

Python 100.00%
deeplearning machinelearning serverless awssam cloudformation

automated-image-augmentation-pipeline's Introduction

Serverless image augmentation pipeline to generate additional training data from existing images for machine learning use-cases

What is this?

This repository provides a deployable solution using Infrastructure-as-Code (IaC) templates to help you set up a serverless image augmentation pipeline to generate additional training data from existing images for machine learning use-cases.

Background

One significant challenge in Machine Learning is obtaining high-quality training data across various applications and domains. The quality and quantity of training data directly influences model performance, generalization, robustness, bias mitigation, and interpretability. Insufficient or low-quality data can hinder a model's ability to learn effectively, generalize to unseen examples, handle diverse conditions, avoid biases, and provide interpretable results.

Image augmentation is a technique of altering the existing data to create more data for the model training process, by applying a variety of transformations to the original images.

Possible transformations include:

  • Rotation: Rotating the image by a certain angle, introducing variations in orientation.
  • Scaling: Resizing the image to a different scale, simulating different distances or perspectives.
  • Translation: Shifting the image horizontally or vertically, mimicking changes in position.
  • Flipping: Mirroring the image horizontally or vertically, creating reflections.
  • Shearing: Tilting the image along its axis, introducing distortion.
  • Zooming: Zooming in or out of the image, simulating changes in perspective.
  • Brightness and Contrast Adjustment: Changing the brightness or contrast of the image, altering lighting conditions.
  • Noise Injection: Adding random noise to the image, simulating variations in texture or artifacts.

Problem Statement

However, image augmentation poses several challenges during implementation in terms of scalability, performance and cost-efficiency in maintaining the system - especially when training data are huge.

This repository showcases a deployable automated workflow to implement image augmentation mechanisms on a large set of training images using Serverless technologies on AWS.

Solution Overview

Architecture

The architecture diagram above showcases the serverless image augmentation pipeline.

The serverless solution is built using AWS Serverless Application Model (SAM), an open-source framework for building serverless applications. During deployment, AWS SAM transforms and expands the SAM syntax into AWS CloudFormation syntax, allowing you to easily deploy the entire solution using a few simple commands.

The solution provisions the following AWS resources:

  1. S3 Bucket 1 for raw training images
  2. S3 Bucket 2 for augmented training images
  3. Lambda function written in Python that contains the image transformation logic
  4. Event trigger to execute Lambda function when images are uploaded to S3 bucket 1

Here's how it works

  • Users upload original training images into the S3 bucket for raw images, either in batches or all-at-once.
  • The upload operation triggers a Lambda function to be executed against the raw images, with Python code that contains the data transformation logic.
  • Depending on the volume of raw images uploaded, Lambda is able to scale out to accommodate the load by provisioning multiple concurrent lambda execution environments for data processing.
  • Once the image augmentation process is completed, the end result is being uploaded to another S3 bucket for consumption.

Deploy the Solution

Prerequisites

To effectively deploy the project using the AWS SAM Framework, you'd require the following prerequisites:

Deployment Steps

Follow the steps below to deploy the solution using AWS SAM:

  1. Clone this project into your local environment

  2. Navigate to the image_augmentation_sam_app folder, run sam build

  3. Upon successful build, deploy the project with sam deploy --guided

  4. Adapt the image transformation logic to your specific use-case by editing the Python code in image_augmentation_sam_app/image_augmentation_function/app.py

def generate_augmented_images(original_img, NUM_OF_IMAGES_GENERATED):

    # Insert your image augmentation logic here 

    print("Image augmentation completed!")
    return imgs_distorted

Tip: you can leverage external Python libraries such as Albumentations to implement your image augmentation logic

Resources

Clean up

To remove the solution from your AWS account, follow these steps:

  1. Navigate to the image_augmentation_sam_app folder, run sam delete.

Security

See CONTRIBUTING for more information.

automated-image-augmentation-pipeline's People

Contributors

amazon-auto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

glendont

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.