AWS Redshift Infrastructure Automation

As analytics solutions have moved away from the one-size-fits-all model to choosing the right tool for the right function, architectures have become more optimized and performant while simultaneously becoming more complex. Solutions leveraging Amazon Redshift will often be used alongside services including AWS DMS, AWS AppSync, AWS Glue, AWS SCT, Amazon Sagemaker, Amazon QuickSight, and more. One of the core challenges of building these solutions can oftentimes be the integration of these services.

This solution takes advantage of the repeated integrations between different services in common use cases, and leverages the AWS CDK to automate the provisioning of AWS analytics services, primarily Amazon Redshift. Deployment is now customizing a JSON configuration file indicating the resources to be used, and this solution takes those inputs to auto-provision the required infrastructure dynamically.

PLEASE NOTE: This solution is meant for proof of concept or demo use cases, and not for production workloads.

Overview of Deployment
Prerequisites
1. Launching a VPC
2. Auto-assigning public IPv4 addresses
Deployment Steps
1. Configure the config file
2. Launch the staging template
Troubleshooting
Feedback

Overview of Deployment

This project consists of a two-phase deployment: the staging infrastructure, and the target infrastructure. The target infrastructure is the end-goal configuration of AWS analytics services which are needed for a POC or other use case. The staging infrastructure will launch an EC2 instance to run a CDK application which will provision the resources of this target infrastructure.

To achieve this, a JSON-formatted config file specifying the desired service configurations needs to be uploaded to an S3 bucket. The location of this file in S3 is used as a parameter in the CloudFormation stack, alongside further details of the staging infrastructure. Once the CloudFormation stack is launched, the resources are provisioned automatically.

Here you can see a diagram giving an overview of this flow:

The following sections give further details of how to complete these steps.

Prerequisites

In order to run the staging stack, some resources need to be preconfigured:

A VPC containing a public subnet that has IPv4 auto-assign enabled -- if either of these aren't configured please see launching a VPC and auto-assigning public IPv4 addresses below
A key pair that can be accessed (see the documentation on how to create a new one)
If using DMS or SCT, opening source firewalls/ security groups to allow for traffic from AWS

If these are complete, continue to deployment steps.

Launching a VPC

An option for provisioning the VPC is to use the VPC Launch Wizard console -- you can see the details of the infrastructure launched using this wizard here.

Open the VPC Launch Wizard console linked above and press Select for creating a VPC with a single public subnet
Configure your desired VPC size, VPC name, subnet size, and subnet name -- other values can be kept as default
Press Create VPC

These resources will be sufficient for the staging infrastructure. If a manually provisioned VPC is preferred, having at minimum a public subnet is required.

Auto-assigning public IPv4 addresses

To ensure instances launched in this subnet will be auto-assigned public IPv4 addresses,

Navigate to the Subnets tab in the VPC console -- select the subnet you intend to use for your staging infrastructure (i.e. the subnet name created with the launch wizard above), and under details, see whether the "Auto-assign public IPv4 address" value is Yes or No
If the value is No, select Actions > Modify auto-assign IP settings

select the "Enable auto-assign public IPv4 address" checkbox
Press Save

Deployment Steps

In order to launch the staging and target infrastructures, download the user-config-template.json file and the CDKstaging.yaml file in this repo.

Configure the config file

The structure of the config file has two parts: (1) a list of key-value pairs, which create a mapping between a specific service and whether it should be launched in the target infrastructure, and (2) configurations for the service that are launched in the target infrastructure. Open the user-config-template.json file and replace the values for the Service Keys in the first section with the appropriate Launch Value defined in the table below. If you're looking to create a resource, define the corresponding Configuration fields in the second section.

Service Key	Launch Values	Configuration	Description
`vpc_id`	`CREATE`, existing VPC ID	In case of `CREATE`, configure `vpc`: `on_prem_cidr`: CIDR block used to connect to VPC (for security groups) `vpc_cidr`: The CIDR block used for the VPC private IPs and size `number_of_az`: Number of Availability Zones the VPC should cover `cidr_mask`: The size of the public and private subnet to be launched in the VPC.	[REQUIRED] The VPC to launch the target resources in -- can either be an existing VPC or created from scratch.
`redshift_endpoint`	`CREATE`, `N/A`, existing Redshift endpoint	In case of `CREATE`, configure `redshift`: `cluster_identifier`: Name to be used in the cluster ID `database_name`: Name of the database `node_type`: `ds2.xlarge`, `ds2.8xlarge`, `dc1.large`, `dc1.8xlarge`, `dc2.large`, `dc2.8xlarge`, `ra3.xlplus`, `ra3.4xlplus`, or `ra3.16xlarge` `number_of_nodes`: Number of compute nodes `master_user_name`: Username to be used for Redshift database `subnet_type`: Subnet type the cluster should be launched in -- `PUBLIC` or `PRIVATE` (note: must be existing in VPC) `encryption`: Whether the cluster should be encrypted -- `y`/`Y` or `n`/`N`	Launching a Redshift cluster.
`dms_instance_private_endpoint`	`CREATE`, `N/A`	Requires at least 2 subnets in different Availability Zones.	The DMS instance used to migrate data.
`dms_on_prem_to_redshift_target`	`CREATE`, `N/A`	Can only CREATE if are also creating DMS instance and Redshift cluster. In case of `CREATE`, configure `dms_on_prem_to_redshift`: `source_db`: Name of source database to migrate `source_engine`: Engine type of the source `source_schema`: Name of source schema to migrate `source_host`: DNS endpoint of the source `source_user`: Username of the database to migrate `source_port`: [INT] Port to connect to connect on `migration_type`: `full-load`, `cdc`, or `full-load-and-cdc`	Creates a migration task and migration endpoints between a source and Redshift configured above.
`sct_on_prem_to_redshift_target`	`CREATE`, `N/A`	Can only CREATE if are also creating Redshift cluster. In case of `CREATE`, uses configuration from `dms_on_prem_to_redshift` (see above) and `sct_on_prem_to_redshift`: `key_name`: EC2 key pair name to be used for EC2 running SCT `s3_bucket_output`: S3 bucket to be used for SCT artifacts	Launches an EC2 instance and installs SCT to be used for schema conversion.

You can see an example of a completed config file under user-config-sample.json.

Once all appropriate Launch Values and Configurations have been defined, upload the config file to an S3 bucket.

Launch the staging template

Open the CloudFormation console and under Create stack select With new resources (standard)
Select Upload a template file under Specify template and choose the downloaded CDKstaging.yaml file, then press Next

Fill in the fields with the following values:

Field Name	Value
Stack name	A name to be used for the launched CloudFormation stacks
Configuration File	The URI of the config file uploaded to S3 in the previous section
EC2 AMI	The AMI to be used for the staging instance -- do not change unless need to for compliance requirements
Key Pair	Select the key pair in your account to be used to SSH into the staging instance
On Prem CIDR	The CIDR to be used to SSH into the staging instance
Subnet ID	Select the public subnet with IPv4 auto-assign enabled from the prerequisites
Source Password	Password of the source database

An example:

Press Next

To make troubleshooting easier, under Stack creation options, select Disabled under "Rollback on failure" -- press Next
At the bottom of the page, select the IAM acknowledgment and press Create stack to launch the stack

At this point the launch will be initiated. Please see troubleshooting below if the stack launch stalls at any point.

Troubleshooting

In the case that the template stall, logs of CloudFormation/CDK events and errors will be generated in the staging EC2 instance. This can be accessed by connecting to the instance.

Navigate to the EC2 console and select the checkbox next to the EC2 instance named "[your CloudFormation stack name]-EC2Instance"
In the top right corner, press Connect
Choose the tab corresponding to the preferred connection option and follow the instructions
1. In the case that you choose to connect using the browser-based EC2 Instance Connect console, please see this page about troubleshooting connections to EC2 Instance Connect for instructions on how to configure IAM permissions and security groups
Run sudo tail -35f /var/log/cloud-init-output.log to access the logs

Feedback

Our aim is to make this tool as dynamic and comprehensive as possible, so we’d love to hear your feedback. Let us know your experience deploying the solution, and share any other use cases that the automation solution doesn’t yet support. Please use the Issues tab under this repo, and we’ll use that to guide our roadmap.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

pgvillena / amazon-redshift-infrastructure-automation Goto Github PK

amazon-redshift-infrastructure-automation's Introduction

AWS Redshift Infrastructure Automation

Table of Contents

Overview of Deployment

Prerequisites

Launching a VPC

Auto-assigning public IPv4 addresses

Deployment Steps

Configure the config file

Launch the staging template

Troubleshooting

Feedback

Security

License

amazon-redshift-infrastructure-automation's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent