vmware-archive / terraforming-aws Goto Github PK

Templates to deploy PCF and PKS

License: Apache License 2.0

HCL 94.88% Shell 5.12%

terraforming-aws's Introduction

DEPRECATION

This repo is going to be archived. The terraform templates that should be used for deploying an Ops Manager, PAS and/or PKS, can be found at https://github.com/pivotal/paving No PRs or Issues will be responded to here.

Terraforming AWS

What is this?

Set of terraform modules for deploying Ops Manager, PAS and PKS infrastructure requirements like:

Friendly DNS entries in Route53
A RDS instance (optional)
A Virtual Private Network (VPC), subnets, Security Groups
Necessary s3 buckets
NAT Gateway services
Network Load Balancers
An IAM User with proper permissions
Tagged resources

Note: This is not an exhaustive list of resources created, this will vary depending of your arguments and what you're deploying.

Prerequisites

Terraform CLI

brew update
brew install terraform

AWS Permissions

AmazonEC2FullAccess
AmazonRDSFullAccess
AmazonRoute53FullAccess
AmazonS3FullAccess
AmazonVPCFullAccess
IAMFullAccess
AWSKeyManagementServicePowerUser

Note: You will also need to create a custom policy as the following and add to the same user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "KMSKeyDeletionAndUpdate",
            "Effect": "Allow",
            "Action": [
                "kms:UpdateKeyDescription",
                "kms:ScheduleKeyDeletion"
            ],
            "Resource": "*"
        }
    ]
}

Are you using Platform Automation?

Be sure to skip the creation of the Ops Manager VM. Do not include the vars listed here. If you create your Ops Manager using terraform, you will not be able to manage it with Platform Automation.

Deployment of the infrastructure is still required.

Deploying Infrastructure

First, you'll need to clone this repo. Then, depending on if you're deploying PAS or PKS you need to perform the following steps:

cd into the proper directory:
Create terraform.tfvars file
Populate credentials file or env variables
Run terraform apply:

terraform init
terraform plan -out=pcf.tfplan
terraform apply pcf.tfplan

Var File

Copy the stub content below into a file called terraform.tfvars and put it in the root of this project. These vars will be used when you run terraform apply. You should fill in the stub values with the correct content.

env_name           = "some-environment-name"
region             = "us-west-1"
availability_zones = ["us-west-1a", "us-west-1c"]
ops_manager_ami    = "ami-4f291f2f"
rds_instance_count = 1
dns_suffix         = "example.com"
vpc_cidr           = "10.0.0.0/16"
use_route53        = true
use_ssh_routes     = true
use_tcp_routes     = true

ssl_cert = <<EOF
-----BEGIN CERTIFICATE-----
some cert
-----END CERTIFICATE-----
EOF

ssl_private_key = <<EOF
-----BEGIN RSA PRIVATE KEY-----
some cert private key
-----END RSA PRIVATE KEY-----
EOF

tags = {
    Team = "Dev"
    Project = "WebApp3"
}

Credentials

Create a credentials.yml file with the following contents:

provider "aws" {
  access_key = "YOUR_AWS_ACCESS_KEY"
  secret_key = "YOUR_AWS_SECRET_KEY"
  region     = "YOUR_AWS_REGION"
}

Alternatively, populate the following environment variables before running the terraform plan:

$ export AWS_ACCESS_KEY_ID="anaccesskey"
$ export AWS_SECRET_ACCESS_KEY="asecretkey"
$ export AWS_DEFAULT_REGION="us-west-2"

See Terraform documentation on the AWS Provider for more ways on providing credentials, especially if using EC2 Roles or AWS_SESSION_TOKEN.

Variables

env_name: (required) An arbitrary unique name for namespacing resources
region: (required) Region you want to deploy your resources to
availability_zones: (required) List of AZs you want to deploy to
dns_suffix: (required) Domain to add environment subdomain to
hosted_zone: (optional) Parent domain already managed by Route53. If specified, the DNS records will be added to this Route53 zone instead of a new zone.
ssl_cert: (optional) SSL certificate for HTTP load balancer configuration. Required unless ssl_ca_cert is specified.
ssl_private_key: (optional) Private key for above SSL certificate. Required unless ssl_ca_cert is specified.
ssl_ca_cert: (optional) SSL CA certificate used to generate self-signed HTTP load balancer certificate. Required unless ssl_cert is specified.
ssl_ca_private_key: (optional) Private key for above SSL CA certificate. Required unless ssl_cert is specified.
tags: (optional) A map of AWS tags that are applied to the created resources. By default, the following tags are set: Application = Cloud Foundry, Environment = $env_name
vpc_cidr: (default: 10.0.0.0/16) Internal CIDR block for the AWS VPC.
use_route53: (default: true) Controls whether or not Route53 DNS resources are created.
use_ssh_routes: (default: true) Enable ssh routing
use_tcp_routes: (default: true) Controls whether or not tcp routing is enabled.

Ops Manager (optional)

ops_manager_ami: (optional) Ops Manager AMI, get the right AMI according to your region from the AWS guide downloaded from Pivotal Network (if set to "" no Ops Manager VM will be created)
optional_ops_manager_ami: (optional) Additional Ops Manager AMI, get the right AMI according to your region from the AWS guide downloaded from Pivotal Network
ops_manager_instance_type: (default: m4.large) Ops Manager instance type
ops_manager_private: (default: false) Set to true if you want Ops Manager deployed in a private subnet instead of a public subnet

S3 Buckets (optional) (PAS only)

create_backup_pas_buckets: (default: false)
create_versioned_pas_buckets: (default: false)

RDS (optional)

rds_instance_count: (default: 0) Whether or not you would like an RDS for your deployment
rds_instance_class: (default: db.m4.large) Size of the RDS to deploy
rds_db_username: (default: admin) Username for RDS authentication

Isolation Segments (optional) (PAS only)

create_isoseg_resources (optional) Set to 1 to create HTTP load-balancer across 3 zones for isolation segments.
isoseg_ssl_cert: (optional) SSL certificate for Iso Seg HTTP load balancer configuration. Required unless isoseg_ssl_ca_cert is specified.
isoseg_ssl_private_key: (optional) Private key for above SSL certificate. Required unless isoseg_ssl_ca_cert is specified.
isoseg_ssl_ca_cert: (optional) SSL CA certificate used to generate self-signed Iso Seg HTTP load balancer certificate. Required unless isoseg_ssl_cert is specified.
isoseg_ssl_ca_private_key: (optional) Private key for above SSL CA certificate. Required unless isoseg_ssl_cert is specified.

Notes

You can choose whether you would like an RDS or not. By default we have rds_instance_count set to 0 but setting it to 1 will deploy an RDS instance.

Note: RDS instances take a long time to deploy, keep that in mind. They're not required.

Tearing down environment

Note: This will only destroy resources deployed by Terraform. You will need to clean up anything deployed on top of that infrastructure yourself (e.g. by running om delete-installation)

terraform destroy

Looking to setup a different IaaS

We have have other terraform templates:

terraforming-aws's People

Contributors

Stargazers

Watchers

Forkers

brightzheng100 pdelagrave amohemed akoranne kumareshbabuns jomsie gleclaire xuewei-ju sneal lemaral topher33 nsivakrishna matthewfischer shawntuatara akiraboy ljfranklin ashcarrigan pke0108 nevenc making goprasad brylex418 voor goehringc xetamus kkallday derwei crdant acloudtechie ik-kubernetes micahlee ob1-sc jacopen jedi132000 vinicelms juanfcabrera nickjameswebb xchapter7x bnilson-tmobile madamkiwi jaimegag davidwadden bkirkware stonezdj gildaf xtreme-jon-ji acarrigan-pivotal scotthoughton totallygreg daxterm nthomson-pivotal ctatineni shru6 bradfutch ciriarte vishalgpt19 salamandastron1 aledivu lgs vf-ford johncornish willisc7 keedya drawsmcgraw maki-home aucnet-dev arseniipetrovich rdjagadeesh warroyo sikendermohammad lerytke lindseytanderson pivotal-andreas-schillinger gerald-dev ronakbanka yogendra vieuxcolon isabella232 madhubysani01 wanghui17 eswarchennupati panchalnimesh apoti-tech-school

terraforming-aws's Issues

Standalone NAT gateway lacks HA for egress routing

Per the NAT Gateway docs, the current routing through a standalone NAT gateway in the non-"Internetless" configuration creates a dependency for egress routing on the first availability zone. In the event that AZ1 goes down, the remaining AZ's will go offline.

https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/infra/nat.tf#L30-L36
https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/infra/nat.tf#L46-L52

Would you consider merging a PR for this?

Make `psql` a prerequisite to running the control plane

You're SSH tunneling into the Ops Man instance and attempting to run psql on the local host, except the user might not have psql installed. Consider adding something like this to be a little more informative:

command -v psql >/dev/null 2>&1 || { echo >&2 "I require psql but it's not installed.  Aborting."; exit 1; }

Error message before:

Error: Error applying plan:

1 error(s) occurred:

* null_resource.create_databases: Error running command './db/create_databases.sh': exit status 127. Output: + ssh_socket=/tmp/session1
+ main
+ echo 'Creating Databases'
Creating Databases
+ local opsman_ssh_key_path=/tmp/opsman_ssh_key
+ echo '-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
'
+ chmod 600 /tmp/opsman_ssh_key
+ local port=5432
+ trap cleanup EXIT
+ ssh -fNg -M -S /tmp/session1 -L 5432:{RDS_INSTANCE}.rds.amazonaws.com:5432 -i /tmp/opsman_ssh_key -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no [email protected]
Warning: Permanently added '3.82.132.105' (ECDSA) to the list of known hosts.
Unauthorized use is strictly prohibited. All access and activity
is subject to logging and monitoring.
+ sleep 5
+ PGPASSWORD={PGPASSWORD}
+ psql --host=127.0.0.1 --port=5432 --username=administrator --dbname=postgres
./db/create_databases.sh: line 28: psql: command not found
+ cleanup
+ ssh -S /tmp/session1 -O exit indubitably
Exit request sent.


Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

issue with sec group

I am using the version downloaded from PivNet
https://network.pivotal.io/api/v2/products/elastic-runtime/releases/11702/product_files/43457/download
it is from PAS 2.0 and is reported to be 0.5.0 zip but then...

it reports to be 0.2.0 in "version" file - which looks suprising.
but may be not -https://github.com/pivotal-cf/terraforming-aws/blob/master/version

So I am assuming I am really using 0.5.0...
This failed when OpsMan created bosh/0 because of this commit that removed inbound on 6868 25555 and also I had to add those and also add 8443 from 10.0.0.0/16 into the pcf-ops-manager-security-group
https://github.com/pivotal-cf/terraforming-aws/commit/3a9c45dd51e25cabeab5d8f79ae08f3a391be897

Created NS record has only 1 value (not 1 per name server)

Summary

When terraforming a control plane with env_name="cp" and dns_suffix="aws.63r53rk54v0r.com" the name sever records are incorrectly being created as a single string, instead of a list of strings.

Happy to be wrong on this, so please let me know if I've got things wrong.

Expected Behavior

Actual Behavior

Proposed fix

I've fixed the issue in a hard fork of terraforming-aws in this commit. I'm happy to make this a PR too.

Pin AWS terraform provider or else changes will take place without you knowing it

We don't specify a version for the aws terraform-provider..

We pin the version of terraform used to 0.11.7, but as per the terraform doc, terraform provider versions can change at an independent rate. If you don't pin the version of the provider used, it can change without you knowing it.

When terraform-provider-aws 1.42.0 was released yesterday, our terraform broke and was unable provision acm certs correctly. When we pinned the version back to 1.41.0, everything went back to working just fine.

We just got bit by this in our sandbox environment. Luckily we caught it there before we rolled out any more changes to production. I'd strongly advise adding something like version = "<= 1.41.0" to the terraform provider configuration

Problems with opsmgr being handled by terraform

Opsmgr comes packaged as an AMI defined within terraform.

But when it comes time to upgrade the version of opsmgr, things get tricky. You have to...

backup your opsmgr installation to something like s3
find out the opsmgr AMI corresponding to your IaaS
update/run the terraform to use that AMI
re-import your opsmgr installation

This is a stateful operation; terraform doesn't know anything about what's actually running on the EC2 instance. It'll gladly blow it away and replace it with a new EC2 instance without any regard for running processes, persistent volumes, etc. So I, as an operator, am responsible for maintaining/backing up/restoring the state of opsmgr. This sounds like something BOSH is designed to do.

The pcf-pipelines project dealt with upgrading opsmgr by doing some funky stuff with cliaas to replace the VM. But even that wouldn't work if your VPC was managed by terraform. If you update the opsmgr VM without terraform being aware of it, the next time terraform runs it will replace your new opsmgr VM with the old AMI.

I'm curious if there's a medium/long-term solution to this problem. Specifically: does Pivotal have any plans to refactor opsmgr into a BOSH-managed resource rather than a terraform-managed resource?

Thanks for your time, I appreciate it!

More policies needed than the ones described in the README

There are a handful of policies listed in the README file that are required by users using terraforming-aws but they aren't enough. I had to add a custom policy in order to successfully get terraforming-aws to pave the entire infrastructure it wants.

Here's the JSON for the custom policy I had to add. I think it should be described in the README file, at least.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "kms:ScheduleKeyDeletion",
                "kms:DisableKey",
                "kms:UpdateKeyDescription"
            ],
            "Resource": [
                "arn:aws:kms:*:*:*/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "kms:ListKeys",
                "kms:GenerateRandom",
                "kms:ListAliases",
                "kms:CreateKey"
            ],
            "Resource": "*"
        }
    ]
}

Change to use NTP IP Address instead of list of NTP Servers

AWS has updated guidance for NTP servers, and instead of using 0.amazon.pool.ntp.org,1.amazon.pool.ntp.org,2.amazon.pool.ntp.org,3.amazon.pool.ntp.org just use 169.254.169.123 as per https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

Still Creates RDS subnets and security groups if there's no RDS present

You are optionally allow to set rds_instance_count to 0, which doesn't create a RDS instance. Except, when this is 0, you still create rds_subnets , rds_subnet_group, and mysql_security_group -- none of which are needed since there's no RDS present.

You can use count = 0 to get rid of these, which (although still a little hackish) prevents the creation of those options. Incoming PR.

terraforming-aws vs pcf-pipelines

Is there a difference between the terraforming code in the install-aws directory of the pcf-pipelines project and this project?

Document permission level on AWS account used by Terraform

We're trying to bootstrap OpsManager 1.9 on AWS using this repo. We ran into an issue wherein we don't know what set of IAM permissions are needed by the AWS user used by Terraform.

It will be nice if you could document the set of permissions needed. Otherwise the only workaround is to give the AWS user full access.

Ops Manager Security Group for Private blocks access to AWS

https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/ops_manager/security_group.tf#L29

When the ops manager is deployed privately, this egress rule makes it so that the Ops Manager cannot reach AWS itself, causing all of the calls to fail.

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

PKS Support

Hi,

The PKS Releng team has began testing the product on AWS. Here are some of the infrastructure prerequisites PKS:

opsman
VPC
subnets in each availability zone
security groups
DNS record for opsman
IAM Role with EC2, ELB, S3 permissions

As you can see most of the resources above are already created by the templates however, the IAM role is currently not part of the templates. PKS requires a subset of the resources that are currently being created by the templates. For instance, the product doesn't need S3 buckets, DNS records for TCP, Diego Brain SSH endpoints, or ELBs.

Instead of adding count = ${var.pks ? 0 : 1} to all the PAS related resources, I'm proposing that common infrastructure resources live at the top level, PAS related resources in a directory/module named pas and PKS related resources in pks.

terraforming-aws/
  iam.tf
  ...
  dns.tf
  pks/
    firewall.tf
    ...
    iam.tf
  pas/
    dns.tf (pas specific dns records)
    iam.tf
    ...
    elb.tf
    buckets.tf

Looking forward to hearing your feedback and if this aligns with any long-term plans you have for the templates. If this idea sounds good, I'd be open to making a PR for this change.

Thanks,

Kevin

better test

This is a better test issue

Leverage existing certificate when using it for LB

In my AWS account I have an existing certificate that I would want to leverage for provisioning the LB. Being able to specify the ARN of the certificate in place of the ssl_cert.

README is missing mention of dns_suffix variable

The dns_suffix variable is required, but is missing from any mention in the README.

The examples in the README are wrong

The examples in the "Running" section of the README show these variables:

-var "availability_zone1=us-west-1a"
-var "availability_zone2=us-west-1b"

It looks like this has been replaced with a single list variable called availability_zones. We should update the examples to reflect this.

The "Running" and "Variables" sections also lack any mention of the dns_suffix variable.

Ability to have LB DNS names as outputs

Hi guys,

Big fans of your work. The client I'm working with has a need to be able to automate pulling the loadbalancer DNS names after they're created in order to do things like automate BOSH deployments using the load balancers (Concourse and the ATCs, is one of their use cases). I can't imagine we're the first/only ones that could use this, and overall I think it would be a great value add to the repository.

Specifically could the following resource's DNS names be exposed as outputs:

https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/control_plane/lb.tf#L56

https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/pas/lbs.tf#L32

https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/pas/lbs.tf#L108

https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/pas/lbs.tf#L166

Consider switching to using NAT gateways rather than NAT instances

Inspired by this issue in the BBL repo:

Instead of using NAT instances, which require maintaining the AMI && are not HA, why not use AWS NAT gateways? They're a managed service, do not incur any downtime, and do not require the same level of maintenance as NAT instances.

(EDIT: accidentally swapped "instance" and "gateway". Oops)

ops_manager_security_group

The security group ops_manager_security_group is created with the following code:

resource "aws_security_group" "ops_manager_security_group" {
  name        = "ops_manager_security_group"
  description = "Ops Manager Security Group"
  vpc_id      = "${var.vpc_id}"

  ingress {
    cidr_blocks = ["${var.private ? var.vpc_cidr : "0.0.0.0/0"}"]
    protocol    = "tcp"
    from_port   = 22
    to_port     = 22
  }

  ingress {
    cidr_blocks = ["${var.private ? var.vpc_cidr : "0.0.0.0/0"}"]
    protocol    = "tcp"
    from_port   = 80
    to_port     = 80
  }

  ingress {
    cidr_blocks = ["${var.private ? var.vpc_cidr : "0.0.0.0/0"}"]
    protocol    = "tcp"
    from_port   = 443
    to_port     = 443
  }

  egress {
    cidr_blocks = ["${var.private ? var.vpc_cidr : "0.0.0.0/0"}"]
    protocol    = "-1"
    from_port   = 0
    to_port     = 0
  }

  tags = "${merge(var.tags, map("Name", "${var.env_name}-ops-manager-security-group"))}"
}

However when BOSH Director for AWS is configured and you then apply the changes, the installation fails unless you manually add access to the newly created bosh/0 instance (which is given the same security group). i.e. by adding an ingress rule that allows communication between all instances within the security group.

Result with default security group settings:

Started deploying
 Creating VM for instance 'bosh/0' from stemcell 'ami-0d32d60be3f351c8d light'... Finished (00:00:37)
 Waiting for the agent on VM 'i-0bf1b1565b39e1e3c' to be ready... Failed (00:10:16)
Failed deploying (00:10:54)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)


Deploying:
 Creating instance 'bosh/0':
   Waiting until instance is ready:
     Post https://vcap:&lt;redacted&gt;@10.0.16.5:6868/agent: dial tcp 10.0.16.5:6868: i/o timeout

Result with the following ingress rule added to ops_manager_security_group:
All traffic | All | All | sg-XXXXXXXXXXXXXXX (ops_manager_security_group)

tarted deploying
 Deleting VM 'i-0bf1b1565b39e1e3c'... Finished (00:00:03)
 Creating VM for instance 'bosh/0' from stemcell 'ami-0d32d60be3f351c8d light'... Finished (00:00:22)
 Waiting for the agent on VM 'i-0f127560ddc71dd85' to be ready... Finished (00:02:33)
 Creating disk... Finished (00:00:09)

I have found adding this into modules/ops_manager_security_group.tf fixes it for me:

  ingress {
    self      = "true"
    protocol  = "-1"
    from_port = 0
    to_port   = 0
  }

I'm happy to submit a PR if required.

Add 'CopyImage' permission to IAM user created

This is related to this PR.

The short, when using the these terraforming scripts, and then following the docs to upgrade OpsMan another permission is required on the user.

-- @jtarchie && @fredwangwang

test

PKS install now fails since Ops Manager can no longer register PKS instance to load balancer.

Task 22

Task 22 | 15:11:35 | Preparing deployment: Preparing deployment (00:00:07)
Task 22 | 15:11:54 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 22 | 15:11:54 | Compiling packages: golang-1-linux/8fb48ae1b653b7d0b49d0cbcea856bb8da8a5700 (00:01:27)
Task 22 | 15:13:21 | Compiling packages: bosh-dns/138f3bd2440ba97f0a7d8912facb5d4a2b320850 (00:00:29)
Task 22 | 15:14:23 | Creating missing vms: pivotal-container-service/266bd7de-296c-4b44-805f-1d5ee9bd2f23 (0) (00:00:51)
                   L Error: Unknown CPI error 'Unknown' with message 'User: arn:aws:sts::798022768933:assumed-role/dev_om_role/i-01da0f1b791fe6c1b is not authorized to perform: elasticloadbalancing:RegisterInstancesWithLoadBalancer on resource: arn:aws:elasticloadbalancing:us-east-1:798022768933:loadbalancer/dev-pks-api' in 'create_vm' CPI method (CPI request ID: 'cpi-768931')
Task 22 | 15:15:14 | Error: Unknown CPI error 'Unknown' with message 'User: arn:aws:sts::798022768933:assumed-role/dev_om_role/i-01da0f1b791fe6c1b is not authorized to perform: elasticloadbalancing:RegisterInstancesWithLoadBalancer on resource: arn:aws:elasticloadbalancing:us-east-1:798022768933:loadbalancer/dev-pks-api' in 'create_vm' CPI method (CPI request ID: 'cpi-768931')

Task 22 Started  Fri Nov 16 15:11:35 UTC 2018
Task 22 Finished Fri Nov 16 15:15:14 UTC 2018
Task 22 Duration 00:03:39
Task 22 error


Updating deployment:
  Expected task '22' to succeed but state is 'error'
Exit code 1
===== 2018-11-16 15:15:14 UTC Finished "/usr/local/bin/bosh --no-color --non-interactive --tty --environment=10.0.16.5 --deployment=pivotal-container-service-e270cbb29aa9caf8fd27 deploy /var/tempest/workspaces/default/deployments/pivotal-container-service-e270cbb29aa9caf8fd27.yml"; Duration: 221s; Exit Status: 1
{"type": "step_finished", "id": "bosh.deploying.pivotal-container-service-e270cbb29aa9caf8fd27"}
Exited with 1.
could not execute "apply-changes": installation was unsuccessful

Either:

Change PKS API to a Network Load Balancer
OR
Bring back missing permissions.

Test cibot

Are all the steps documented from running `terraform apply` to getting a running PAS?

Prompted by this slack convo: https://pivotal.slack.com/archives/C3LQ5CS2X/p1545240977058600. TLDR: We didn't realize you needed to add custom vm_extensions to all LB traffic to reach the GoRouter/SSH proxy. We stumbled onto this CI task which does this but we didn't see any documentation tell us this was necessary.

At minimum, it would be nice to have the vm_extension instructions mentioned in the README and release notes. There's probably also a larger conversation around whether the steps needed to go from terraform apply to a running PAS are sufficiently documented (either in this repo or in official Pivotal docs). Feel free to ping me or anyone else from PAS RelEng team if you're interested in chatting more about this.

Creation of separate PKS opsmgr / s3 buckets for that PKS opsmgr?

I am under the impression it is best practice to have a separate opsmgr for PAS and PKS. I'd like to stand up a separate opsmgr for PKS, but have it be within the same foundation as PAS.

So two questions:

Is there an intended way to configure the terraform to create both a PAS and PKS opsmgr within in the same environment? E.g., multiple opsmgrs but each for different purposes?
How should the opsmgrs + opsmgr resources be differentiated? For example, with the opsmgr's s3 bucket, there appears to be a var for bucket-suffix var on the end. Is that intended to be used for differentiation? E.g., sandbox-ops-manager-bucket-pks and sandbox-ops-manager-bucket-pas?

Thanks a ton. Any advice is appreciated!

Create an AWS IAM Instance Profile for OpsMan

It's generally preferable to use an AWS IAM Instance Profile in the OpsMan director configuration instead of specifying an access key id and secret key. It would be awesome if the Terraform templates created the instance profile for you and output it in the Terraform output, similar to what BBL does with the bosh director.

aws_security_group "control_plane_internal" is unused

The control_plane_internal security group is unused at the moment. I recommend deleting it as it is more restrictive than the default security group ("vms_security_group") which allows all traffic within the vpc. Whats more, with this security group assigned solely the bosh director cannot reach the vms anymore.

https://github.com/pivotal-cf/terraforming-aws/blob/df4ecc78dc410d0dd361017ebb48c96ede47216c/modules/control_plane/network.tf#L29

The README references variable `nat_key_pair_name` that doesn't exist

Output has ops_manager_ssh_private_key as sensitive

Following the guide at:

https://docs.pivotal.io/pivotalcf/2-1/customizing/aws-om-config-terraform.html#aws-config

It says to look at the terraform output for the value of ops_manager_ssh_private_key. In a recent commit that output has been marked as sensitive so people reading the guide can't complete that step.

https://github.com/pivotal-cf/terraforming-aws/blob/master/outputs.tf#L170

Manually removing the sensitive flag on that output.tf file and re-running terraform refresh shows the private key and allows for the completion of the guide.

Add PKS related permissions in the Policy template.

The AWS Ops policy terraforming-aws/modules/ops_manager/templates/iam_policy.json is fine for PAS but for PKS you need elasticloadbalancing:DeregisterInstancesFromLoadBalancer, elasticloadbalancing:RegisterInstancesWithLoadBalancer also in the template.

Error uploading server certificate, error: MalformedCertificate

Getting this error while running terraform apply plan:
aws_iam_server_certificate.cert: [WARN] Error uploading server certificate, error: MalformedCertificate: Unable to parse certificate. Please ensure the certificate is in PEM format.

I have this config in my terraform.tfvars

env_name           = "sn"
access_key         = "AKIXXXXX"
secret_key         = "XXXXXXXXXXXXXX"
region             = "us-east-1"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
ops_manager_ami    = "ami-8406d8fb"
dns_suffix         = "pcf.we.bs"

ssl_cert = "pcf.we.bs.crt"

ssl_private_key = "pcf.we.bs.key"

The pcf.we.bs.crt and pcf.we.bs.key files have been generated by using https://github.com/aws-quickstart/quickstart-pivotal-cloudfoundry/blob/master/scripts/gen_ssl_certs.sh

pcf.we.bs is a domain that I own.

Kindly help in resolving this issue

Using PCF terraform with AWS organization accounts

We have to setup PCF on AWS organization account with federation enabled. There are some restrictions using this account like we cannot create new IAM users . I have to deploy PCF using the shared account of federation, which does not have permissions to create a new IAM user.

Using terraform to deploy PCF , requires a new IAM user to be created. I have access to this shared account via role(which is assumed) .

I am getting this error:
aws_iam_user.ops_manager: Error creating IAM User bwce_om_user: AccessDenied: User: arn:aws:sts::XXXXXXXXXX:assumed-role/EC2Role/1547621556019946500 is not authorized to perform: iam:CreateUser on resource: arn:aws:iam::XXXXXXXXXXXX:user/bwce_om_user with an explicit deny

I need to know what needs to be changed in terraform ? So that there is no need to create new IAM user and I can use the role instead of it .

Generated RDS passwords include invalid mysql URI's during PAS install

I had a foundation that terraform created an output of:

rds_password = i4fq]QpVzC7G&j4>

Later on in the PAS install bosh fails with error:

Task 203 | 20:46:06 | Updating instance cloud_controller: cloud_controller/a2cca9d0-2349-4c2c-a836-964569491c84 (0) (canary) (00:04:08)
                    L Error: Action Failed get_task: Task c0640d40-f8f0-4f9f-6187-8964e0d85dd1 result: 1 of 6 pre-start scripts failed. Failed Jobs: cloud_controller_ng. Successful Jobs: routing-api, route_registrar, syslog_forwarder, bosh-dns, consul_agent.

Digging further into the cloud Controller VM and the log, /var/vcap/sys/log/cloud_controller_ng/pre-start.stderr.log, I get this error:

[2018-02-02 20:50:08+0000] URI::InvalidURIError: bad URI(is not URI?): mysql2://rds_admin:i4fq]QpVzC7G&j4%[email protected]:3306/ccdb

> in the password gets changed to URI Encoded value %3E which is still not the correct password for RDS.

User Data not being assigned to Ops Manager

This is causing the smoke tests to fail when they attempt to execute:

Request{ Method: 'GET', URL: 'http://169.254.169.254/latest/user-data' }

With no user data assigned to the instance, the above call will return a 404 while attempting to install PAS.

ops_manager_ami = "ami-c5c755bd"

Missing entry for AWS region "us-east-2"

When attempting to use the terraform config (0.5.0), I received the following error:

Error: aws_instance.nat: 1 error(s) occurred:

* aws_instance.nat: lookup: lookup failed to find 'us-east-2' in:

${lookup(var.nat_ami_map, var.region)}

I've submitted pull request #24 based on an AMI found, discussion on the PR.

ec2:copyimage

I have heard the feedback that "In the IAM policy for OpsMan, we also need the ec2:copyimage policy. Otherwise, we got error message while uploading stemcells"

https://pivotal.slack.com/archives/C3LQ5CS2X/p1517956117000002

Yet, we've been able to completely install PCF with the terraform scripts. I suspect this is because our pipelines upload light stemcells, but that ec2:copyimage is used for non-light stemcells.

No outbound rules for security group in which ops manager is brought up in.

Terraform changed its implementation to remove default outbound rule which allows all traffic in a security group. We will have to add it in to compensate for that.

AWS Roles invalid

The following recommended roles do not exist on AWS anymore:

AmazonIAMFullAccess
AmazonKMSFullAccess

Without some additional roles which I suppose were encompassed in these ones above, the terraform fails.

It would be great to update these to what they have been renamed / moved to on AWS today.

all other roles as defined in the readme are still available:

web load balancer type.

Reading the docs on installing pivotal [manually on aws](https://docs.pivotal.io/pivotalcf/2-4/om/aws/prepare-env-manual.html#pcfaws-web-lb) step 13 says to create an application load balancer while terraform code creates an network type load balancer.

Which method is correct?

https://github.com/pivotal-cf/terraforming-aws/blob/9a47766dc276d489ff6bb23ae8357b874e7dd625/modules/pas/lbs.tf#L34

terraforming-control-plane fails if rds_instance_count = 0

It still tries to execute the create database script. I'll submit a PR if I get to it before you.

release 21 prompting for var.private_route_table_ids

Using release 21 to generate the terraform plan presents a prompt to enter a value for var.private_route_table_ids

There is no mention of this in any of the documentation. What is intended here?

Consider using credentials file rather than incorporating it into main.tf

Certain customers have mechanisms that will generate a credentials file for terraform for them, or else they're using AWS_SESSION_TOKEN which negate the ability to even use a AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Consider removing these files from main.tf for PAS and PKS.

IAM User doesn't have enough permissions

Terraform scripts create a new IAM user with various permissions. It doesn't seem it is enough for kickstarting OpsManager and Bosh Director creation.

Here's the error message (when "Apply Changes" for the first time after configuring Director tile):

creating stemcell (bosh-aws-xen-hvm-ubuntu-trusty-go_agent 3468.21):
  CPI 'create_stemcell' method responded with error: CmdError{"type":"Unknown","message":"You are not authorized to perform this operation.","ok_to_retry":false}

There are two policies created for the IAM user:
[DEPLOYMENT_ert] (https://github.com/pivotal-cf/terraforming-aws/files/1706600/ert.json.txt)
[DEPLOYMENT_ops_manager] (https://github.com/pivotal-cf/terraforming-aws/files/1706601/ops-manager.json.txt)

Error goes away when I use another IAM user with "AdministratorAccess" role.

OOTB scripts use hosted zones/subdomains which means ACM can't be used

The hosted zones/subdomains are created by Terraform script and that means you can't use ACM cert.
Because ACM cert expects the required domains to be already configured.
This makes it really difficult to use OOTB terraform scripts for AWS.
Also, increases pain during POCs and requires too many modifications.

Minimimum set of AWS IaaM policies

Notice that a set of IaaM policies is defined here; https://github.com/pivotal-cf/terraforming-aws/blob/master/ops_manager/templates/iam_policy.json. To run the terraform script on AWS does one need to be an admin or of a privileged user to execute?

I've got a unique situation that policies are controlled by an organization and they require a policy that needs to be set specifically for the running of the terraform template. This would make an assumption that the policy would need to be referenced directly instead of setting the policy explicitly. Without forking the repository, is there a means to accommodate.

OOTB script doesn't have ACM support

Most AWS customers will use ACM cert, at least for the POCs.
Path to POC should be easy with the PCF Terraform scripts we release on PivNet.

Currently, you can't use ACM certificate ARN (you need a PEM private key which is not possible for ACM cert).

k8s clusters are unable to create Load Balancers

Problem

Since we are using existing subnets, and the cluster is deployed into a private subnet, services are unable to create load balancers.

How to recreate

Create a file like load-balancer.yml or similar and put the contents into it:

kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
    app: MyApp
ports:
    - protocol: TCP
    port: 443
    targetPort: 443
type: LoadBalancer

Create this load balancer for a pks created cluster:
```
kubectl apply -f load-balancer.yml
```

Result is this:

    {
            "apiVersion": "v1",
            "count": 7,
            "eventTime": null,
            "firstTimestamp": "2018-10-31T21:08:32Z",
            "involvedObject": {
                "apiVersion": "v1",
                "kind": "Service",
                "name": "my-service",
                "namespace": "default",
                "resourceVersion": "33581",
                "uid": "1f345c73-dd51-11e8-868d-12039f4a481a"
            },
            "kind": "Event",
            "lastTimestamp": "2018-10-31T21:13:48Z",
            "message": "Error creating load balancer (will retry): failed to ensure load balancer for service default/my-service: could not find any suitable subnets for creating the ELB",
            "metadata": {
                "creationTimestamp": "2018-10-31T21:08:32Z",
                "name": "my-service.1562cda09d9fa18b",
                "namespace": "default",
                "resourceVersion": "34040",
                "selfLink": "/api/v1/namespaces/default/events/my-service.1562cda09d9fa18b",
                "uid": "1f5261d3-dd51-11e8-868d-12039f4a481a"
            },
            "reason": "CreatingLoadBalancerFailed",
            "reportingComponent": "",
            "reportingInstance": "",
            "source": {
                "component": "service-controller"
            },
            "type": "Warning"
        }

How to solve (temporary workaround)

You need to get the kubernetes cluster name from bosh, since it's actually the bosh deployment name.
Once you have that, you can add these tags in (work in progress apologies for how dirty this is):
voor@811d718

2 of the 3 tags are generic and can always be there, it's the kubernetes.io/cluster/service-instance_4a7a5305-88dc-4d90-9785-fc86b08c3d08 that is specific to your kubernetes clusters, I'm unsure of how to apply that each time a new cluster is created.

Add an output for the created pks_api_lb_security_group

HI Team.
in my scenario i want to use the PKS API over the internet. As the terraform creates an internet facing loadbalancer (with not option atm to create it as internal?) and the needed vm security group "pks_api_lb_security_group" everything is available for doing so. However i would like to automate my setup in pipelines and would like to use the terraform output to get the security group id to push it to bosh in a vm_extension.

In short: Please create an output for the resource "pks_api_lb_security_group" in the module and in the output of "terraforming-pks". https://github.com/pivotal-cf/terraforming-aws/blob/master/modules/pks/lb.tf#L2