Git Product home page Git Product logo

openshift-aws-setup's Introduction

openshift-aws-setup

Overview

This is an Ansible auotmation playbook that provisions a small OpenShift environment (1 master, x app nodes) that is suitable for demos, POCs and small workshops. The playbook can deploy either Origin or Container Platform.

AWS related configuration can be customised by modifying vars/aws-config.yaml. Note that the number of application nodes is configurable, the default is 3.

Usage

Please see the branch that matches the version of OpenShift you want to install for usage instructions and further details.

Network Topology

Network Diagram

A private VPC and DNS is used, OpenShift is installed using the private IP addresses. This means the IP addresses never change, unlike EC2 public addresses, and the environment can be stopped and started as needed. install_node_selector A bastion is created as part of the installation, however once the installation is complete it is no longer needed and may be stopped or terminated. Note that it can be handy to keep the bastion around in a stopped state in case you want to manually re-run the installation again.

References

openshift-aws-setup's People

Contributors

ecwpz91 avatar gnunn1 avatar sjbylo avatar veretie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openshift-aws-setup's Issues

Add support for lets encrypt

It should be relatively straightforward to add support for lets encrypt by running certbot after the VMs are created and the DNS is registered in Route 53 but before OpenShift itself is actually installed.

This could be expanded to wildcard support for the router once lets encrypt supports it.

Ansible 2.4 problems

Ansible 2.4 breaks a number of things:

  • DNS zone is no longer returned as part of a set
  • The subnet requires an availability zone
  • The teardown fails to delete the private zone for some reason

Add option to set defaultNodeSelector

When using the environment generated by this script for workshops, it may be desirable to ensure the master remains unencumbered with user pods. Adding an option for this would translate into adding the following to the openshift inventory.cfg

osm_default_node_selector="region=primary"

Or maybe this is something the user sets in the inventory themselves if desired similar to enabling logging and metrics.

TASK [setup-virtual-machines : allocate a new elastic IP and associate it with master] fails

Do I need something you did not document? This fails for me when trying to allocate the elastic IP.

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "EC2ResponseError: 400 Bad Request\n\nInvalidParameterCombinationYou must specify an allocation id when mapping an address to a VPC instanceca81cce1-02c9-441d-9e20-0bc2afdf9ca7"}


PLAY [Create Infrastructure] *****************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [localhost]

TASK [Verify Ansible Version] ****************************************************************************************
ok: [localhost] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [fail] **********************************************************************************************************
skipping: [localhost] => (item=rhsm_username)
skipping: [localhost] => (item=rhsm_password)
skipping: [localhost] => (item=rhsm_pool)

TASK [setup-vpc : Provision VPC] *************************************************************************************
ok: [localhost]

TASK [setup-vpc : Provision internet gateway] ************************************************************************
ok: [localhost]

TASK [setup-vpc : Provision subnet] **********************************************************************************
ok: [localhost]

TASK [setup-vpc : Set up public subnet route table] ******************************************************************
ok: [localhost]

TASK [setup-vpc : register vpc facts] ********************************************************************************
ok: [localhost]

TASK [setup-vpc : set availability zone fact] ************************************************************************
ok: [localhost]

TASK [setup-security-groups : create openshift-vpc] ******************************************************************
ok: [localhost]

TASK [setup-security-groups : create openshift-public-ingress] *******************************************************
ok: [localhost]

TASK [setup-security-groups : create openshift-public-egress] ********************************************************
ok: [localhost]

TASK [setup-security-groups : create openshift-ssh] ******************************************************************
ok: [localhost]

TASK [setup-virtual-machines : Find ami without ami id] **************************************************************
skipping: [localhost]

TASK [setup-virtual-machines : Find ami id with ami] *****************************************************************
ok: [localhost]

TASK [setup-virtual-machines : Provision master] *********************************************************************
ok: [localhost]

TASK [setup-virtual-machines : allocate a new elastic IP and associate it with master] *******************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "EC2ResponseError: 400 Bad Request\n\nInvalidParameterCombinationYou must specify an allocation id when mapping an address to a VPC instanceca81cce1-02c9-441d-9e20-0bc2afdf9ca7"}
to retry, use: --limit @/home/llange/git/openshift-aws-setup/openshift-playbook.retry

PLAY RECAP ***********************************************************************************************************
localhost : ok=14 changed=0 unreachable=0 failed=1

lets encrypt issues

When use_lets_encrypt is true there are a couple of issues that need fixing:

  1. pyOpenSSL needs to be updated to a later version as the version supplied by RHEL is too old

  2. The set_facts for the certificate names doesn't propagate to the subsequent plays that need them

EIP associate

the EIP associate isn't working. trying to find out if it's an issue in the us-west-2 region for my account. EIP associate worked in us-east-2 yesterday for me

AWS StorageClass has no zone

The default AWS storage class that is created by the installer has no zone, the zone should be set to the same zone that the installation is in.

Install the cockpit dashboard

Install the dashboard plugin and add the nodes. Also add any admin users to the wheel group on the master or look at tokens.

sudo yum install cockpit-dashboard

Setting node-selector causes gluster install to fail

Setting the node-selector to primary causes the gluster install to fail as it seems to inherit the node selector from the install inventory in osm_default_node_selector , i.e. region=primary instead of just relying on the storage node selector glusterfs=storage-host.

This could be fixed in one of three ways:

a. Remove the osm_default_node_selector from the inventory, however this makes the master and gluster nodes scheduleable

b. Add the region=primary to the gluster nodes, somewhat better than A in that the master won't receive user pods but the gluster nodes will

c. Add the node selector post install, not sure if this will work...

Improve registering VMs

If the subscription manager fails, which it does occasionally, the playbook keeps running and things go wrong since packages cannot be updated and installed. Need to put in some logic to retry the subscription manager when it fails.

Use cloudformations template

Using a cloud formations template would make it easier to delete the stack and logically contain all resources in a single grouping.

Add option to enable cockpit

  • Add admin RHEL user to master
  • Modify security group to open port 9090
  • Move creating ocp admin user as required rather then optional and explicitly specify

openshift-pre-reqs failed: No package matching 'docker-1.13.1' found available, installed or updated

Hi,

first of all: thanks for this nice openshift-aws-setup.
I tried to install this on different environments (e.g. us-west-1) but it appears the following error:
TASK [openshift-pre-reqs : install pre-req packages on nodes] ************************************************************************ Wednesday 16 May 2018 11:39:16 +0200 (0:00:01.125) 0:13:00.056 ********* fatal: [52.53.196.38]: FAILED! => {"changed": false, "msg": "No package matching 'docker-1.13.1' found available, installed or updated", "rc": 126, "results": ["net-tools-2.0-0.22.20131004git.el7.x86_64 providing net-tools is already installed", "kexec-tools-2.0.15-13.el7.x86_64 providing kexec-tools is already installed", "No package matching 'docker-1.13.1' found available, installed or updated"]} fatal: [54.177.24.34]: FAILED! => {"changed": false, "msg": "No package matching 'docker-1.13.1' found available, installed or updated", "rc": 126, "results": ["net-tools-2.0-0.22.20131004git.el7.x86_64 providing net-tools is already installed", "kexec-tools-2.0.15-13.el7.x86_64 providing kexec-tools is already installed", "No package matching 'docker-1.13.1' found available, installed or updated"]} fatal: [13.56.168.241]: FAILED! => {"changed": false, "msg": "No package matching 'docker-1.13.1' found available, installed or updated", "rc": 126, "results": ["net-tools-2.0-0.22.20131004git.el7.x86_64 providing net-tools is already installed", "kexec-tools-2.0.15-13.el7.x86_64 providing kexec-tools is already installed", "No package matching 'docker-1.13.1' found available, installed or updated"]}

I guess yum doesn't get the specific docker version?

Maybe you could add the docker repo in the ansible script? Or do you have any idea to fix this?

Regards,
d spree

Pod logs not working

On one cluster we noticed that fluentd was not running on any of the nodes as the label was missing on the nodes. I don't know how that happened as it was a fresh install.

If you encounter this, add the label and Fluentd will start on the nodes and logging will work.

For each node:
oc label nodes logging-infra-fluentd=true

AWS Cloud Provider

The AWS Cloud provider is not working on the first install. Instead you have to install everything with the cloud provider disabled, ssh to the bastion, uncomment the cloud provider parameters and then run the installer from the bastion manually.

No response running "install pre-req packages on nodes" task

Hello,

when running the "install pre-req packages on nodes" task using "openshift-enterprise" as the deployment_type and 3.10 as the version, the script does not respond. See logs below, I interrupted execution after one hour. I was able to replicate it a few times.

When using "origin" instead, the tasks fails with ""No package matching 'docker-1.13.1' found available." I use a RHEL 7.5 AMI.

Could it be related to my Red Hat subscription ? I use the Employee SKU.

2018-08-12 16:12:09,307 p=86491 u=dsoffner | TASK [openshift-pre-reqs : install pre-req packages on nodes] ******************************************************************************* 2018-08-12 16:12:09,308 p=86491 u=dsoffner | task path: /Users/dsoffner/dev/ansible/openshift/openshift-aws-setup/roles/openshift-pre-reqs/tasks/main.yml:2 2018-08-12 16:12:09,309 p=86491 u=dsoffner | Sunday 12 August 2018 16:12:09 +1000 (0:00:04.007) 0:15:57.048 ********* 2018-08-12 16:12:09,376 p=86491 u=dsoffner | Read vars_file 'vars/aws-config.yml' 2018-08-12 16:12:09,461 p=86491 u=dsoffner | Using module file /Library/Python/2.7/site-packages/ansible/modules/packaging/os/yum.py 2018-08-12 16:12:09,479 p=86491 u=dsoffner | Read vars_file 'vars/aws-config.yml' 2018-08-12 16:12:09,521 p=86491 u=dsoffner | Using module file /Library/Python/2.7/site-packages/ansible/modules/packaging/os/yum.py 2018-08-12 16:12:09,585 p=86491 u=dsoffner | Escalation succeeded 2018-08-12 16:12:09,606 p=86491 u=dsoffner | Read vars_file 'vars/aws-config.yml' 2018-08-12 16:12:09,613 p=86491 u=dsoffner | Using module file /Library/Python/2.7/site-packages/ansible/modules/packaging/os/yum.py 2018-08-12 16:12:09,692 p=86491 u=dsoffner | Escalation succeeded 2018-08-12 16:12:09,721 p=86491 u=dsoffner | Using module file /Library/Python/2.7/site-packages/ansible/modules/packaging/os/yum.py 2018-08-12 16:12:09,735 p=86491 u=dsoffner | Escalation succeeded 2018-08-12 16:12:10,019 p=86491 u=dsoffner | Escalation succeeded 2018-08-12 17:18:11,511 p=86491 u=dsoffner | [ERROR]: User interrupted execution

Remove dynamic volumes on teardown

By default the environment uses dynamic volumes for metrics and logging, these are not currently removed automatically by the teardown script.

Support installing 3.9

This is reminder issue, the branch develop now supports 3.9 and will need merging to master after some additional testing.

subs rrror

I got this, after running ./openshift-playbook-run.sh

fatal: [localhost]: FAILED! => {"changed": false, "msg": "Variables required to register subscriptions are missing, please confirm that either rhsm_username, rhsm_password and rhsm_pool OR rhsm_key_id and rhsm_org_id is defined"}

I am trying to deploy origin and not OCP so this should not come.

"msg": "No account found for the given parameters"

I error out here

TASK [setup-dns : add bastion dns] *****************************************************************************************
task path: /home/wreckedred/stacks/os-aws/dd-os/openshift-aws-setup-master/roles/setup-dns/tasks/main.yml:12
Thursday 20 September 2018 15:14:14 -0600 (0:00:02.134) 0:02:36.191 ****
Using module file /usr/lib/python2.7/dist-packages/ansible/modules/cloud/amazon/route53.py
ESTABLISH LOCAL CONNECTION FOR USER: wreckedred
EXEC /bin/sh -c 'python && sleep 0'
The full traceback is:
File "/tmp/ansible_PNNJDM/ansible_module_route53.py", line 659, in main
invoke_with_throttling_retries(commit, changes, retry_interval_in, wait_in, wait_timeout_in)
File "/tmp/ansible_PNNJDM/ansible_module_route53.py", line 452, in invoke_with_throttling_retries
raise e
fatal: [localhost]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"alias": null,
"alias_evaluate_target_health": false,
"alias_hosted_zone_id": null,
"aws_access_key": null,
"aws_secret_key": null,
"command": "create",
"ec2_url": null,
"failover": null,
"health_check": null,
"hosted_zone_id": null,
"identifier": null,
"overwrite": true,
"private_zone": true,
"profile": null,
"record": "bastion.ose.local",
"region": null,
"retry_interval": "500",
"security_token": null,
"state": "create",
"ttl": 300,
"type": "A",
"validate_certs": true,
"value": [
"10.0.1.166"
],
"vpc_id": "vpc-0e9edb696e1e1a288",
"wait": true,
"wait_timeout": 300,
"weight": null,
"zone": "ose.local"
}
},
"msg": "No account found for the given parameters"
}

No account found....

setup vpc fails in region eu-west-1

TASK [setup-vpc : Provision subnet] *******************************************************************************************************************************************************************************
Thursday 07 December 2017 16:33:09 +0100 (0:00:00.834) 0:00:03.409 *****
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Invalid type for parameter AvailabilityZone, value: None, type: <type 'NoneType'>, valid types: <type 'basestring'>
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):\n File "/tmp/ansible_VmS_6X/ansible_module_ec2_vpc_subnet.py", line 274, in \n main()\n File "/tmp/ansible_VmS_6X/ansible_module_ec2_vpc_subnet.py", line 262, in main\n check_mode=module.check_mode)\n File "/tmp/ansible_VmS_6X/ansible_module_ec2_vpc_subnet.py", line 189, in ensure_subnet_present\n subnet = create_subnet(conn, module, vpc_id, cidr, az, check_mode)\n File "/tmp/ansible_VmS_6X/ansible_module_ec2_vpc_subnet.py", line 127, in create_subnet\n new_subnet = get_subnet_info(conn.create_subnet(VpcId=vpc_id, CidrBlock=cidr, AvailabilityZone=az))\n File "/usr/lib/python2.7/site-packages/botocore/client.py", line 312, in _api_call\n return self._make_api_call(operation_name, kwargs)\n File "/usr/lib/python2.7/site-packages/botocore/client.py", line 575, in _make_api_call\n api_params, operation_model, context=request_context)\n File "/usr/lib/python2.7/site-packages/botocore/client.py", line 630, in _convert_to_request_dict\n api_params, operation_model)\n File "/usr/lib/python2.7/site-packages/botocore/validate.py", line 291, in serialize_to_request\n raise ParamValidationError(report=report.generate_report())\nbotocore.exceptions.ParamValidationError: Parameter validation failed:\nInvalid type for parameter AvailabilityZone, value: None, type: <type 'NoneType'>, valid types: <type 'basestring'>\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1}
to retry, use: --limit @/home/llange/git/openshift-aws-setup/openshift-playbook.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost : ok=4 changed=0 unreachable=0 failed=1

Thursday 07 December 2017 16:33:10 +0100 (0:00:01.129) 0:00:04.538 *****

setup-vpc : Provision VPC ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.44s
setup-vpc : Provision subnet ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.13s
Gathering Facts -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.94s
setup-vpc : Provision internet gateway --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.83s
Verify Ansible Version ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.04s
fail ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.04s

real 0m6.590s
user 0m3.938s
sys 0m0.688s
[llange@juri openshift-aws-setup]$ rpm -q ansible
ansible-2.4.1.0-2.fc26.noarch

Provide parameters: AWS access key [1] and AWS secret [2]

exported values for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and ran

/openshift-playbook-run.sh -e rhsm_username=myuser -e rhsm_password=mypass -e rhsm_pool="mypool" my_aws_key_id_value  my_aws_secret

gives me
Provide parameters: AWS access key [1] and AWS secret [2]

Cryptic error when provisioning master

Hello,

thanks for providing this setup script.
Running into the following cryptic error when provisioning master.

TASK [setup-virtual-machines : Provision master] ******************************************************************************************************************************************************************* Friday 10 August 2018 12:07:51 +1000 (0:00:02.774) 0:00:33.366 ********* fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/Users/dsoffner/dev/ansible/openshift/openshift-aws-setup/roles/setup-virtual-machines/tasks/main.yml': line 22, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Provision master\n ^ here\n"} to retry, use: --limit @/Users/dsoffner/dev/ansible/openshift/openshift-aws-setup/openshift-playbook.retry

My environment for running the script is as follows:

  • MacOS 10.13.6
  • Ansible 2.6.2
  • Python 2.7.10

Thanks, Daniel.

Support installing 3.7

This is in-progress, some changes that need to be made include setting openshift_clusterid in the inventory and add the tag kubernetes.io/cluster/{{namespace}}={{namespace}} to the ec2 instances.

Having some issues with the openshift installation failing in weird ways.

ansible service catalog error "cannot allocate memory"

When enabling metrics and installing 3.7 the service catalog install is failing with the error below. I suspect either the bastion or the master needs to be larger. Trying the bastion first since 16 GB should be fine for the master and the bastion is small at 2 GB.

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [ip-10-0-1-149.us-west-2.compute.internal]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}
[WARNING]: Could not create retry file '/usr/share/ansible/openshift-
ansible/playbooks/byo/config.retry'. [Errno 13] Permission denied:
u'/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry'

Use ssl cert for cockpit

Now that ssl certificates are supported we should also use them for cockpit which is available on port 9090. Since it uses the same domain as the openshift console the same certificate can be used for cockpit, it's just a matter of copying them to the right spots.

RHEL AMI & find

Is the default RHEL AMI in aws-config.yml private? The default AMI is listed in RH doc [0] for us-west-2 but that AMI ID isn't available when searching for under my account console, and the script fails with an odd error. [0] https://access.redhat.com/articles/3085701

Error:

 task path: /home/jupittma/Dev/repos/openshift-aws-setup/roles/setup-virtual-machines/tasks/main.yml:22
fatal: [localhost]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'results'\n\nThe error appears to have been in '/home/jupittma/Dev/repos/openshift-aws-setup/roles/setup-virtual-machines/tasks/main.yml': line 22, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Provision master\n  ^ here\n"}

There are other RHEL AMI's available from RH. If I allow the AMI find to pull official RH AMI's (so Owner ID 309956199498), the script found at least one in us-west-2: ami-b55a51cc. Setting that ID in the aws-config.yml fixes the above error.

Mac OS - Boto issue

Need to add the following argument to the inventory.cfg:

[local]
localhost ansible_connection=local ansible_python_interpreter=python

Otherwise, the python libraries required for Boto - Ansible EC2 module won't be found successfully on Mac OS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.