Git Product home page Git Product logo

aws-backup's Introduction

AWS Backup

This is an AWS Backup implementation using Terraform with security and operational best practices in mind.

The following services are supported:

  • RDS
  • EBS
  • EFS
  • DynamoDB

Workflow

  • AWS Backup selects resources to backup using resource tags. The resource tags determine the backup plan to use.

  • A lambda function identifies resources without the backup_policy tag, auto-tags those resources with the default backup plan and notifies the operations team.

  • Backups are performed using the AWS Backup service. All backups are stored in a backup vault named backup_vault.

Security

This terraform config adds extra security to the AWS backup vault setup by applying a resource policy that prevents any user from:

  • Removing the recovery points
  • Removing the backup vault
  • Changing or removing the resource policy which imposes the previous restrictions

This means that only the root account will ever be able to remove this backup vault! The backup vault will survive even in a scenario where a privileged IAM principal with *:* permissions is compromised.

Backup plan customization

Review the backup-plan.tf file and customize the aws_backup_plan resources to match your company policies. This is an example resource definition:

resource "aws_backup_plan" "daily_two_weeks" {
  name = "daily_two_weeks"
  rule {
    rule_name = "daily_two_weeks"
    target_vault_name = "${aws_backup_vault.backup_vault.name}"

    # every day at 3am
    schedule = "cron(0 3 * * ? *)"

    lifecycle {
      delete_after = "14"
    }
  }
}

Customize the name, schedule and lifecycle to match your company requirements. Then create a selector similar to the following:

resource "aws_backup_selection" "daily_two_weeks_selection" {
  plan_id = "${aws_backup_plan.daily_two_weeks.id}"
  name = "daily_two_weeks_selection"
  iam_role_arn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/service-role/AWSBackupDefaultServiceRole"

  selection_tag {
    type = "STRINGEQUALS"
    key = "backup_policy"
    value = "daily_two_weeks"
  }
}

The aws_backup_selection resource is used to match the resources for the aws_backup_plan. In this case the resources with backup_policy tag with value daily_two_weeks are selected and associated with the plan_id.

The backup-plan.tf file contains a more complex backup plan which is inspired on the Grandfather-father-son strategy.

Customize notification emails

Edit the variables.tf configuration to define the to and from email addresses to use by the Lambda function to send notifications.

The from email address will require you to perform an SES verification. In other words, after applying these terraform configs you will have to go to the email inbox for the from email address and click on a verification link that will allow the Lambda function to send emails from this address.

Installation

After customization, configure your credentials in ~/.aws/credentials and use the following commands to deploy:

cd aws-backup/ 

terraform init
terraform plan -var profile=awsbackup -var region=us-east-1
terraform apply -var profile=awsbackup -var region=us-east-1

Manually tag all resources in your infrastructure using a tag named backup_policy containing one of aws_backup_plan as values. Any resources that AWS backup can manage and were not manually tagged will be notified by the lambda function to the operations team.

Regions

AWS Backup will select resources per-region, this solution needs to be deployed multiple times, one for each region where your company is creating resources.

Disabling backup

It is possible to disable backups for a specific resource using the tag backup_policy with value none. This will prevent AWS Backup from running backups on the resource and the Lambda function from sending notifications.

Auto-tagging resources

The lambda function is run every day and inspects the infrastructure looking for resources which have no backups enabled (no backup_policy tag). When such a resource is found the lambda function will:

  • auto tag it with backup_policy: daily_two_weeks
  • Notify the infrastructure team, as they might want to change the backup policy and update the terraform configs.

Restoring a backup

The recommended steps for restoring a backup can be found in the AWS documentation

Development

terraform fmt

aws-backup's People

Contributors

andresriancho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aws-backup's Issues

Improve Lambda function deployment steps

Right now the whole thing is broken:

resource "aws_lambda_function" "backup_auto_tagging" {
  filename = "lambda_functions/backup_auto_tagging.zip"
  function_name = "backup_auto_tagging"
  role = aws_iam_role.iam_role_lambda_backup_auto_tagging.arn
  handler = "lambda.handler"
  source_code_hash = filebase64sha256("lambda_functions/backup_auto_tagging.zip")

lambda_functions directory doesn't exist

backup_auto_tagging.zip doesn't exist

AWS Backup error handling and notification

The current implementation completely ignores errors during the backup creation. As a sysadmin I would like to receive a notification when a backup fails, so that I can investigate what went wrong and potentially trigger it manually.

SES secrets

SES secrets

Error: error reading Secrets Manager Secret Version: AccessDeniedException: Access to KMS is not allowed
	status code: 400, request id: 4ff6091e-fc13-4e0c-8805-24820cb106f6

  on ses.tf line 41, in resource "aws_secretsmanager_secret_version" "ses_smtp_user":
  41: resource "aws_secretsmanager_secret_version" "ses_smtp_user" {

Lambda function timeout

context.get_remaining_time_in_millis() (documentation here can be used to get the remaining time in ms for the current context.

In some cases where there are MANY resources I found that the lambda function times out. Sadly this was found manually via CloudWatch Events logs. I would like to see these errors being reported to the admins.

Implement multi-region support

In 3ca7f20 I explain one of the most annoying limitations of the AWS Backup service: it will only backup resources in the region where the backup plan was created, if you want to backup resources in 5 regions, you'll need to duplicate all the configuration for each region.

The problem I've found is that running these won't work:

terraform plan -var profile=awsbackup -var region=us-east-1
terraform apply -var profile=awsbackup -var region=us-east-1
...
terraform plan -var profile=awsbackup -var region=us-west-2
terraform apply -var profile=awsbackup -var region=us-west-2

Because the solution uses IAM resources which are global, thus on the second call to apply terraform will find the already existing user and exit. There are ways around it, but it would be nice to provide the users with an easy to use solution like:

terraform apply -var profile=awsbackup -var region=global
terraform apply -var profile=awsbackup -var region=us-east-1,us-west-2

The first command would deploy all global resources, the second will deploy the solution to the command separated regions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.