Git Product home page Git Product logo

terraform-openshift4-exoscale's Introduction

OpenShift 4 on Exoscale

⚠️ WIP: This is still a work in progress and will change!

This repository provides a Terraform module to provision the infrastructure for an OpenShift 4 cluster on Exoscale.

Please see the VSHN OCP4 on Exoscale install how-to for a step-by-step installation guide.

Overview

The Terraform module in this repository provisions all the infrastructure which is required to setup an OpenShift 4 cluster on Exoscale using UPI (User-provisioned infrastructure).

The module manages the VMs (including their Ignition or cloud-init config), DNS zone and records, security groups, and floating IPs for a highly-available OpenShift 4 cluster.

By default, the module will provision all the VMs with public IPs (the default on Exoscale), and restricts access to the cluster VMs using Exoscale's security group mechanism. Out of the box, all the cluster VMs (which use RedHat CoreOS) are reachable over SSH for debugging purposes using a SSH key which is provided during provisioning.

The module expects that a suitable RHCOS VM template is available in the Exoscale organisation and region in which the cluster is getting deployed.

The module also provisions a pair of load balancer VMs. The module uses vshn-lbaas-exoscale to provision the LBs.

Module input variables

The module provides variables to

  • control the instance size of each VM type (LB, bootstrap, master, infra, storage, and worker). Note that we don't officially support smaller instance sizes than the ones provided as defaults.
  • control the count of each VM type (LB, bootstrap, master, infra, storage, and worker). Note that we don't recommend changing the count for the LBs and masters from their default values.
  • control the size of the root partition for all nodes. This value is used for all nodes and cannot be customized for individual node groups.
  • control the size of the empty partition on worker or infra nodes. By default, worker and infra nodes are provisioned without an empty partition (by defaulting the variable to 0) However, users can create worker and infra nodes with an empty partition by providing a positive value for the variable.
  • control the size of the empty partition on the storage nodes. This partition can be used as backing storage by in-cluster storage clusters, such as Rook-Ceph.
  • configure additional worker node groups. This variable is a map from worker group names (used as node prefixes) to objects providing node instance size, node count, node data disk size, and node state.
  • configure additional affinity group IDs which are configured on all master, infra, storage, and worker VMs This allows users to configure pre-existing affinity groups (e.g. for Exoscale dedicated VM hosts) for the cluster
  • configure additional security group IDs which are configured on worker VMs This allows users to configure pre-existing security groups (e.g. for node ports) for the worker nodes
  • specify the cluster's id, name (optional), Exoscale region, base domain, SSH key, RHCOS template, and Ignition API CA.
  • enable PROXY protocol on the LBs for the ingress router.
  • configure additional Exoscale private networks to attach to the LBs. To avoid issues with network interfaces getting assigned arbitrarily, we recommend to only configure additional private networks after the LBs have been provisioned.
  • specify a bootstrap S3 bucket (required only to provision the boostrap node)
  • specify an Exoscale API key and secret for Floaty
  • specify the username for the APPUiO hieradata Git repository (see next sections for details).
  • provide an API token for control.vshn.net (see next sections for details).
  • choose a dedicated deployment target This allows for using dedicated hypervisors.

The cluster's domain is constructed from the provided base domain, cluster id and cluster name. If a cluster name is provided the cluster domain is set to <cluster name>.<base domain>. Otherwise the cluster domain is set to <cluster id>.<base domain>.

Configuring additional worker groups

Please note that you cannot use names "master", "infra", "worker" or "storage" for additional worker groups. We prohibit these names to ensure there are no collisions between the generated nodes names for different worker groups.

As the examples show, attributes disk_size, state and affinity_group_ids for entries in additional_worker_groups are optional. If these attributes are not given, the nodes are deployed with disk_size = var.root_disk_size, state = "Running" and affinity_group_ids = [].

To configure an additional worker group named "cpu1" with 3 instances with type "CPU-huge" the following input can be given:

# File main.tf
module "cluster" {
  // Remaining config for module omitted

  additional_worker_groups = {
    "cpu1": {
      size: "CPU-huge"
      count: 3
    }
  }
}

To configure an additional worker group named "storage1" with 3 instances with type "Storage-huge", and 5120GB of total disk size (120GB root disk + 5000GB data disk), the following input can be given:

# File main.tf
module "cluster" {
  // Remaining config for module omitted

  additional_worker_groups = {
    "storage1": {
      size: "Storage-huge"
      count: 3
      data_disk_size: 5000
    }
  }
}

Required credentials

  • An unrestricted Exoscale API key in the organisation in which the cluster should be deployed
  • An Exoscale API key for Floaty
    • The minimum required permissions for the Floaty API key are the following "compute-legacy" operations: addIpToNic, listNics, listResourceDetails, listVirtualMachines, queryAsyncJobResult and removeIpFromNic.
  • An API token for the Servers API must be created on control.vshn.net
  • A project access token for the APPUiO hieradata repository must be created on git.vshn.net
    • The minimum required permissions for the project access token are api (to create MRs), read_repository (to clone the repo) and write_repository (to push to the repo).

VSHN service dependencies

Since the module manages a VSHN-specific Puppet configuration for the LB VMs, it needs access to some https://www.vshn.ch[VSHN] infrastructure:

  • The module makes requests to the control.vshn.net Servers API to register the LB VMs in VSHN's Puppet enc (external node classifier)
  • The module needs access to the APPUiO hieradata on git.vshn.net to create the appropriate configuration for the LBs

Using the module outside VSHN

If you're interested in a version of the module which doesn't include VSHN-managed LBs, you can check out the standalone MVP LB configuration in commit 172e2a0.

⚠️ Please note that we're not actively developing the MVP LB configuration at the moment.

Optional features

Private network

⚠️ This mode is less polished than the default mode and we're currently not actively working on improving this mode.

Optionally, the OpenShift 4 cluster VMs can be provisioned solely in an Exoscale managed private network. To use this variation, set module variable use_privnet to true. If required, you can change the CIDR of the private network by setting variable privnet_cidr.

When deploying the RHCOS VMs with a private network only, the VMs must first be provisioned in Stopped state, and then powered on in a subsequent apply step. Otherwise, the initial Ignition config run fails because the Ignition API is not reachable early enough in the boot process, as the network interface is also configured by Ignition in this scenario. This can be achieved by running the following sequence of terraform apply steps. The example assumes that the LBs and bootstrap node have been provisioned correctly already and that we're now provisioning the OCP4 master VMs.

for state in "stopped" "running" "running"; do
  cat >override.tf <<EOF
  module "cluster" {
    bootstrap_count = 1
    infra_count = 0
    worker_count = 0
    master_state = "${state}"
  }
  terraform apply
done

Note: the second terraform apply with state = "Running" may not be required in all cases, but is there as a safeguard if creation of DNS records fails in the first terraform apply with `state = "Running".

terraform-openshift4-exoscale's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-openshift4-exoscale's Issues

Missing DNS records for services not using default apps domain

When not using the default apps domain for services like the console, like it is done on APPUiO Cloud, the terraform module does not create DNS records for those. When setting up a cluster for APPUiO Cloud, the following records are missing:
cname.basedomain IN CNAME cname.apps.basedomain
api.basedomain IN CNAME cname.apps.basedomain
console.basedomain IN CNAME cname.apps.basedomain
registry.basedomain IN CNAME cname.apps.basedomain

Update Terraform to not use deprecated resources

Currently the OCP4 Exoscale Terraform module (and the vshn-lbaas-exoscale Terraform module) uses a bunch of resources which have been deprecated in recent releases of the Exoscale provider. Please note that some of the deprecated resources are actually broken in the latest reslease (0.40.1), e.g. I'm intermittently getting errors like with provider version 0.40.1

│ Error: failed to retrieve reverse DNS: malformed JSON response 500, "queryreversednsforpublicipaddressresponse" was expected.
│ {"errorcode":500,"cserrorcode":9999,"errortext":"Internal error. It might be temporary, try again later and/or contact <[email protected]> if it persists.","uuidList":[]}
│ 
│   with module.cluster.module.lb.exoscale_ipaddress.ingress,
│   on .terraform/modules/cluster.lb/modules/vshn-lbaas-exoscale/main.tf line 35, in resource "exoscale_ipaddress" "ingress":
│   35: resource "exoscale_ipaddress" "ingress" {

We should update this module and the imported vshn-lbaas-exoscale module to ensure we only use supported resources.

Side-note: this will create a functional regression for the module, as the new floating IP resource exoscale_elastic_ip doesn't support settings the reverse DNS name yet (cf. exoscale/terraform-provider-exoscale#188).

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions
.github/workflows/release.yaml
  • actions/checkout v4
  • mikepenz/release-changelog-builder-action v4
  • ncipollo/release-action v1
.github/workflows/test.yaml
  • actions/checkout v4
  • hashicorp/setup-terraform v3
terraform
bootstrap.tf
control_plane.tf
infra.tf
lb.tf
  • github.com/appuio/terraform-modules v6.0.0
modules/node-group/providers.tf
  • exoscale 0.55.0
  • hashicorp/terraform >= 1.3.0
provider.tf
  • exoscale 0.55.0
  • gitfile 1.0.0
  • hashicorp/terraform >= 1.3.0
storage.tf
worker.tf

  • Check this box to trigger a request for Renovate to run again on this repository

Support groups of nodes with a disctinct set of labels and taints (node groups)

Summary

As a solution engineer
I want to be able to provision cluster with mixed worker node sizes
So that I can implement the customer's requirements

Context

Modify Terraform module for OpenShift4 on Exoscale to support mixed worker node types and document how to migrate existing cluster Terraform states to the new module version.

Further links

Acceptance criteria

  • OCP4 clusters on Exoscale with mixed worker node sizes can be provisioned by Terraform

Implementation Ideas

  • Keep current worker config as "default worker group"
  • Change Terraform module to accept additional worker group configurations as map of worker group name to worker group parameters (instance type, disk size, etc.) and instantiate the internal node-group module for each entry

Restrict SSH access to cluster VMs

Summary

We want to restrict SSH access (port 22) to the cluster VMs. SSH access to the cluster VMs should only be allowed from the two LBs instead of allowing SSH access to all nodes from anywhere.

Acceptance criteria

  • SSH access to cluster VMs only allowed from LBs
  • Commodore component updated to use latest version of the Terraform module
  • Rolled out for all clusters managed with the Terraform module (shouldn't need maintenance, since the security group rules are managed individually)

Implementation ideas

  • Change the ssh access security group rules to allow access to the LBs from anywhere, and to allow access to the other nodes only from the LBs. Current security group rules:
    resource "exoscale_security_group_rule" "all_machines_ssh_v4" {
    security_group_id = exoscale_security_group.all_machines.id
    description = "SSH Access"
    type = "INGRESS"
    protocol = "TCP"
    start_port = "22"
    end_port = "22"
    cidr = "0.0.0.0/0"
    }
    resource "exoscale_security_group_rule" "all_machines_ssh_v6" {
    security_group_id = exoscale_security_group.all_machines.id
    description = "SSH Access"
    type = "INGRESS"
    protocol = "TCP"
    start_port = "22"
    end_port = "22"
    cidr = "::/0"
    }

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

This repository currently has no open or pending branches.

Validation Error in additional_worker_groups

The validation message states:

Your configuration of `additional_worker_groups` violates one of the following
constraints:
 * The worker count cannot be less than 0.

But the validation is done like this:

variable "additional_worker_groups" {
  ...
  validation {
    condition = alltrue([
      for k, v in var.additional_worker_groups :
        ...
        v.count > 0 &&
        ...
      ])
  }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.