Git Product home page Git Product logo

aws-lambda-redshift-event-driven-app's Introduction

Building Event Driven Application with AWS Lambda and Amazon Redshift Data API

Introduction

Event driven applications are becoming popular with many customers, where application execution happens in response to events. A primary benefit of this architecture is the decoupling of producer and consumer processes, allowing greater flexibility in application design and building decoupled processes. An example of an event driven application, that we implemented here is an automated workflow being triggered by an event, which executes series of transformations in the data warehouse, leveraging Amazon Redshift, AWS Lambda, Amazon EventBridge and Amazon Simple Notification (SNS).

In response to a schedule event defined in Amazon EventBridge, this application will automatically trigger an AWS Lambda Function to execute a stored procedure performing extract, load and transform (ELT) operations in Amazon Redshift data warehouse leveraging Amazon Redshift Data API. This stored procedure would copy the source data from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift and also aggregate the results. Once complete, it’ll send an event to Amazon EventBridge, which would then trigger a lambda function to send notification to end-users through Amazon SNS Service, to inform them about the availability of updated data in Amazon Redshift.

This event driven serverless architecture offers greater extensibility and simplicity, making it easier to maintain, faster to release new features and also reduce the impact of changes. It also simplifies adding other components or third-party products to the application without much changes.

Pre-requisites

As a pre-requisite for creating the application explained in this blog, you should need to setup an Amazon Redshift cluster and associate it with an AWS Identity and Access Management (IAM) Role. If you don’t have that provisioned in your AWS account, please follow Amazon Redshift getting started guide to set it up.

Solution architecture

We have used NYC Yellow Taxi public dataset for the year 2015. We have pre-populated this dataset in an Amazon S3 bucket folder “event-driven-app-with-lambda-redshift/nyc_yellow_taxi_raw/”.

The following architecture diagram highlights the end-to-end solution:
Architecture Diagram

Below is the simple execution flow for this solution, which you may deploy with CloudFormation template:

  1. Database objects in the Amazon Redshift cluster:
  • Table "nyc_yellow_taxi" which will be used to copy above New York taxi dataset from Amazon S3.
  • Materialized view "nyc_yellow_taxi_volume_analysis" providing an aggregated view of above table
  • Stored procedure "execute_elt_process", to take care of data transformations
  1. Amazon EventBridge rule, EventBridgeScheduledEventRule to be triggered periodically based on a cron expression.

  2. AWS IAM Role, “LambdaRedshiftDataApiETLRole” for AWS Lambda to allow below permissions:

  • Federate to the Amazon Redshift cluster through getClusterCredentials permission avoiding password credentials.
  • Execute queries in Amazon Redshift cluster through redshift-data API calls
  • Logging with AWS CloudWatch for troubleshooting purpose
  • Send notifications through Amazon Simple Notification Service (SNS)
  1. AWS Lambda function, “LambdaRedshiftDataApiETL”, which is triggered automatically with action “execute_sql” as soon as above scheduled event gets executed. It performs an asynchronous call to the stored procedure "execute_elt_process" in Amazon Redshift, performing extract, load and transform (ELT) operations leveraging Amazon Redshift Data API functionality. This AWS Lambda function will execute queries in Amazon Redshift leveraging “redshift-data” client. Based on the input parameter “action”, this lambda function can asynchronously execute Structured Query Language (SQL) statements in Amazon Redshift and thus avoid chances of timing-out in case of long running SQL statements. It can also publish custom notifications through Amazon Simple Notification Service (SNS). Also, it uses Amazon Redshift Data API temporary credentials functionality, which allows it to communicate with Amazon Redshift using AWS Identity and Access Management (IAM)permission, without the need of any password-based authentication. With Data API, there is also no need to configure drivers and connections for your Amazon Redshift cluster, which is handled automatically.

  2. Amazon EventBridge rule, “EventBridgeRedshiftEventRule” to automatically capture completion event, generated by above stored procedure call. This triggers above AWS Lambda function again with action "notify"

  3. AWS Simple Notification Service (SNS) topic, RedshiftNotificationTopicSNS and subscription to your emailid send an automated email notification denoting completion of ELT process as triggered by the AWS Lambda function.

  4. The database objects mentioned in Step#1 above are provisioned automatically by a lambda function, LambdaSetupRedshiftObjects as part of the CloudFormation template through an invocation of the lambda function, LambdaRedshiftDataApiETL created in step# 3 above

Testing the code:

  1. After setting up above solution, you should have an automated pipeline to trigger based on the schedule you defined in Amazon EventBridge scheduled rule’s cron expression. You may view Amazon CloudWatch logs and troubleshoot issues if any in the lambda function. Below is an example of the execution logs for reference:
    Amazon CloudWatch logs

  2. You could also view the query execution status in Amazon Redshift Console, which would also allow you to view detailed execution plan for the queries you executed. One key thing to note here, though the stored procedure may take around six minutes to complete, both the executions of AWS Lambda function would finish within just few seconds. This is primarily because the executions from AWS Lambda on Amazon Redshift was asynchronous. Therefore, the lambda function gets completed after initiating the process in Amazon Redshift without caring about the query completion.
    Amazon Redshift Console Output

  3. After this process is complete, you will receive the notification email shown below to denote completion of the ELT process:
    Architecture Diagram

Conclusion

Amazon Redshift Data API enables you to painlessly interact with Amazon Redshift and enables you to build event-driven and cloud native applications. We demonstrated how to build an event driven application with Amazon Redshift, AWS Lambda and Amazon EventBridge. To learn more about Amazon Redshift Data API, please visit this blog and the documentation.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

aws-lambda-redshift-event-driven-app's People

Contributors

amazon-auto avatar manashdeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.