Git Product home page Git Product logo

certman-operator's Introduction

certman-operator

Go Report Card GoDoc codecov License

About

The Certman Operator is used to automate the provisioning and management of TLS certificates from Let's Encrypt for OpenShift Dedicated clusters provisioned via https://cloud.redhat.com/.

At a high level, Certman Operator is responsible for:

  • Provisioning Certificates after a cluster's successful installation.
  • Reissuing Certificates prior to their expiry.
  • Revoking Certificates upon cluster decomissioning.

Dependencies

GO: 1.19

Operator-SDK: 1.21.0

Hive: v1

Certman Operator is currently dependent on Hive. Hive is an API-driven OpenShift operator providing OpenShift Dedicated cluster provisioning and management.

Specifically, Hive provides a namespace scoped CustomResourceDefinition called ClusterDeployment. Certman watches the Installed spec of instances of that CRD and will attempt to provision certificates for the cluster once this field returns true. Hive is also responsible for the deployment of the certificates to the cluster via syncsets.

Only Hive v1 will work with this release.

How the Certman Operator works

  1. A new OpenShift Dedicated cluster is requested from https://cloud.redhat.com.
  2. The clusterdeployment controller's Reconcile function watches the Installed field of the ClusterDeployment CRD (as explained above). Once the Installed field becomes true, a CertificateRequest resource is created for that cluster.
  3. Certman operator will then request new certificates from Let’s Encrypt based on the populated spec fields of the CertificateRequest CRD.
  4. To prove ownership of the domain, Certman will attempt to answer the Let’s Encrypt DNS-01 challenge by publishing the _acme-challenge subdomain in the cluster’s DNS zone with a TTL of 1 min.
  5. Wait for propagation of the record and then verify the existence of the challenge subdomain by using DNS over HTTPS service from Cloudflare. Certman will retry verification up to 5 times before erroring.
  6. Once the challenge subdomain record has been verified, Let’s Encrypt can verify that you are in control of the domain’s DNS.
  7. Let’s Encrypt will issue certificates once the challenge has been successfully completed. Certman will then delete the challenge subdomain as it is no longer required.
  8. Certificates are then stored in a secret on the management cluster. Hive watches for this secret.
  9. Once the secret contains valid certificates for the cluster, Hive will sync the secrets over to the OpenShift Dedicated cluster using a SyncSet.
  10. Certman operator will reconcile all CertificateRequests every 10 minutes by default. During this reconciliation loop, certman will check for the validity of the existing certificates. As the certificate's expiry nears 45 days, they will be reissued and the secret will be updated. Reissuing certificates this early avoids getting email notifications about certificate expiry from Let’s Encrypt.
  11. Updates to secrets on certificate reissuance will trigger Hive controller’s reconciliation loop which will force a syncset of the new secret to the OpenShift Dedicated cluster. OpenShift will detect that secret has changed and will apply the new certificates to the cluster.
  12. When an OpenShift Dedicated cluster is decommissioned, all valid certificates are first revoked and then the secret is deleted on the management cluster. Hive will then continue deleting the other cluster resources.

Limitations

  • As described above in dependencies, Certman Operator requires Hive for custom resources and actual deployment of certificates. It is therefore not a suitable "out-of-the-box" solution for Let's Encrypt certificate management. For this, we recommend using either openshift-acme or cert-manager. Certman Operator is ideal for use cases when a large number of OpenShift clusters have to be managed centrally.
  • Certman Operator currently only supports DNS Challenges through AWS Route53. There are plans for GCP support. HTTP Challenges is not supported.
  • Certman Operator does not support creation of Let's Encrypt accounts at this time. You must already have a Let's Encrypt account and keys that you can provide to the Certman Operator.
  • Certman Operator does NOT configure the TLS certificates in an OpenShift cluster. This is managed by Hive using SyncSet.

CustomResourceDefinitions

The Certman Operator relies on the following custom resource definitions (CRDs):

  • CertificateRequest, which provides the details needed to request a certificate from Let's Encrypt.

  • ClusterDeployment, which defines a targeted OpenShift managed cluster. The Operator ensures at all times that the OpenShift managed cluster has valid certificates for control plane and pre-defined external routes.

Setup Certman Operator

For local development, you can use either minishift or minikube to develop and run the operator. You will also need to install the operator-sdk.

Local development testing

The script hack/test/local_test.sh can be used to automate local testing by creating a minikube cluster and deploying certman-operator and its dependencies.

Certman Operator Configuration

A ConfigMap is used to store certman operator configuration. The ConfigMap contains one value, default_notification_email_address, the email address to which Let's Encrypt certificate expiry notifications should be sent.

oc create configmap certman-operator \
    [email protected]

Certman Operator Secrets

There are two secrets required for certman-operator to function.

  1. lets-encrypt-account - This secret is used to store the Let's Encrypt account url and keys.
# To fetch the "lets-encrypt-account" secret for a cluster on the Hive shard.
oc -n certman-operator get secret lets-encrypt-account -oyaml

For testing purposes:

# On the staging cluster:
oc -n certman-operator create secret generic lets-encrypt-account \
    --from-file=private-key=private-key.pem \
    --from-file=account-url=account.txt
  1. aws or gcp - Based on which platform is being used (AWS or GCP), this is the secret which contains the cloud platform credentials of the account of the target cluster.
# To fetch the "aws" secret for a cluster on the Hive shard.
NAMESPACE=$(oc get cd -A | grep -i $CLUSTERNAME | awk '{ print $1 }')
oc -n $NAMESPACE get secret aws -oyaml

For testing purpose:

# To create the "aws" secret on staging cluster for testing.
oc -n certman-operator create secret generic aws --from-literal=aws_access_key_id=XXX
--from-literal=aws_secret_access_key=YYYY

NOTE:

  • The 'aws' secret for AWS platform will be required for only non-STS clusters. The STS clusters won't have this secret.

  • For testing purposes, both the secrets (i.e lets-encrypt-account secret and aws/gcp platform credential secret) can be found on the Hive shard of the staging cluster.

Custom Resource Definitions (CRDs)

Create Hive CRDs

git clone [email protected]:openshift/hive.git
oc create -f hive/config/crds

Create Certman Operator CRDs

oc create -f https://raw.githubusercontent.com/openshift/certman-operator/master/deploy/crds/certman.managed.openshift.io_certificaterequests.yaml

Run Operator From Source

WATCH_NAMESPACE="certman-operator" OPERATOR_NAME="certman-operator" go run main.go

Build Operator Image

To build the certman-operator image, can follow the documentation.

Setup & Deploy Operator On OpenShift/Kubernetes Cluster

Create & Use certman-operator Project

oc new-project certman-operator

Setup Service Account

oc create -f deploy/service_account.yaml

Setup RBAC

oc create -f deploy/role.yaml
oc create -f deploy/role_binding.yaml

Deploy the Operator

Edit deploy/operator.yaml, substituting the reference to the image you built above. Then deploy it:

oc create -f deploy/operator.yaml

Metrics

certman_operator_certs_in_last_day_openshift_com reports how many certs have been issued for Openshift.com in the last 24 hours.

certman_operator_certs_in_last_day_openshift_apps_com reports how many certs have been issued for Openshiftapps.com in the last 24 hours.

certman_operator_certs_in_last_week_openshift_com reports how many certs have been issued for Openshift.com in the last 7 days.

certman_operator_certs_in_last_week_openshift_apps_com reports how many certs have been issued for Openshiftapps.com in the last 7 days.

certman_operator_duplicate_certs_in_last_week reports how many certs have had duplication issues.

certman_operator_certificate_valid_duration_days reports how many days before a certificate expires .

Additional record for control plane certificate

Certman Operator always creates a certificate for the control plane for the clusters Hive builds. By passing a string into the pod as an environment variable named EXTRA_RECORD Certman Operator can add an additional record to the SAN of the certificate for the API servers. This string should be the short hostname without the domain. The record will use the same domain as the rest of the cluster for this new record. Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: certman-operator
spec:
  template:
    spec:
    ...
      env:
      - name: EXTRA_RECORD
        value: "myapi"

The example will add myapi.<clustername>.<clusterdomain> to the certificate of the control plane.

License

Certman Operator is licensed under Apache 2.0 license. See the LICENSE file for details.

certman-operator's People

Contributors

2uasimojo avatar aliceh avatar anispate avatar bdematte avatar blrm avatar c-e-brumm avatar cblecker avatar dependabot[bot] avatar dofinn avatar dustman9000 avatar fahlmant avatar iamkirkbater avatar jewzaam avatar jharrington22 avatar lisa avatar luis-falcon avatar mbarnes avatar mjlshen avatar mjudeikis avatar mrbarge avatar ninataneja avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar ravitri avatar sedroche avatar tnierman avatar tparikh avatar wanghaoran1988 avatar yithian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

certman-operator's Issues

Add documentation to the README.md

  • Add documentation to the repository to explain the dependencies on hive.
  • Add instructions to develop the operator.
  • Add instrctions to debug the developer.
  • Document steps to install the operator on a Kubernetes/OpenShift cluster.

Add unit and integration tests

At the moment, the code base is very light on tests. Add tests for

  • Issue Certificates

  • Renew Certificates

  • Revoke Certificate

Reconcile error failing for Openstack clouds. (Openstack support missing)

I'm deploying Openshift cluster on RHOSP/vanilla openstack via Hive and facing the same issue on both of these.
After the cluster is deployed successfully, certman-operator attempts to change spec.platform which is an immutable object.

Here are the logs:

{"level":"info","ts":1604925442.9728994,"logger":"controller_clusterdeployment","msg":"reconciling ClusterDeployment","Request.Namespace":"cluster-stakater-binero","Request.Name":"openshift-binero"}
{"level":"info","ts":1604925442.9729996,"logger":"controller_clusterdeployment","msg":"adding CertmanOperator finalizer to the ClusterDeployment","Request.Namespace":"cluster-stakater-binero","Request.Name":"openshift-binero"}
{"level":"error","ts":1604925443.152129,"logger":"controller_clusterdeployment","msg":"error addming finalizer to ClusterDeployment","Request.Namespace":"cluster-stakater-binero","Request.Name":"openshift-binero","error":"admission webhook \"clusterdeploymentvalidators.admission.hive.openshift.io\" denied the request: Attempted to change ClusterDeployment.Spec.Platform. ClusterDeployment.Spec is immutable except for [CertificateBundles ClusterMetadata ControlPlaneConfig Ingress Installed PreserveOnDelete ClusterPoolRef PowerState HibernateAfter]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/openshift/certman-operator/pkg/controller/clusterdeployment.(*ReconcileClusterDeployment).Reconcile\n\t/workdir/pkg/controller/clusterdeployment/clusterdeployment_controller.go:170\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1604925443.1522331,"logger":"controller_clusterdeployment","msg":"Reconcile complete.","Request.Namespace":"cluster-stakater-binero","Request.Name":"openshift-binero","Duration":0.179270139}
{"level":"error","ts":1604925443.1522815,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"clusterdeployment-controller","request":"cluster-stakater-binero/openshift-binero","error":"admission webhook \"clusterdeploymentvalidators.admission.hive.openshift.io\" denied the request: Attempted to change ClusterDeployment.Spec.Platform. ClusterDeployment.Spec is immutable except for [CertificateBundles ClusterMetadata ControlPlaneConfig Ingress Installed PreserveOnDelete ClusterPoolRef PowerState HibernateAfter]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1604925667.8630793,"logger":"controller_clusterdeployment","msg":"reconciling ClusterDeployment","Request.Namespace":"cluster-enento-devtest","Request.Name":"cluster"}

My ClusterDeployment after successful installation:

apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: openshift-binero
  namespace: cluster-stakater-binero
  finalizers:
    - hive.openshift.io/deprovision
  labels:
    api.openshift.com/managed: 'true'
    hive.openshift.io/version-major: '4'
    hive.openshift.io/version-major-minor-patch: 4.5.4
    hive.openshift.io/version-major-minor: '4.5'
    hive.openshift.io/cluster-platform: openstack
    hive.openshift.io/cluster-region: unknown
    cluster: binero
spec:
  controlPlaneConfig:
    servingCertificates:
      default: serving-cert
  clusterName: binero
  clusterMetadata:
    adminKubeconfigSecretRef:
      name: openshift-binero-0-vprlh-admin-kubeconfig
    adminPasswordSecretRef:
      name: openshift-binero-0-vprlh-admin-password
    clusterID: 4493c425-e79f-4ea0-8332-559f0c8b1385
    infraID: binero-qb5wz
  provisioning:
    imageSetRef:
      name: openshift-cluster-imageset-v4.5.4
    installConfigSecretRef:
      name: install-config
    sshPrivateKeySecretRef:
      name: ssh-key
  platform:
    openstack:
      cloud: binero
      credentialsSecretRef:
        name: cloud-creds
  ingress:
    - domain: <domain>
      name: default
      servingCertificate: serving-cert
  baseDomain: <basedomain>
  certificateBundles:
    - certificateSecretRef:
        name: serving-cert-secret-generated
      generate: true
      name: serving-cert
  installed: true
  pullSecretRef:
    name: image-pull-secret
status:
  apiURL: 'https://api.<domain>:6443'
  cliImage: >-
    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ddf1330bf78848849738ba17ae256ec2fb6d6ebe357f8857eefad913d4081eed
  conditions:
    - lastProbeTime: '2020-11-09T11:38:27Z'
      lastTransitionTime: '2020-11-09T11:38:27Z'
      message: >-
        secret serving-cert-secret-generated for certbundle serving-cert was not
        found
      reason: IngressCertificateNotFound
      status: 'True'
      type: IngressCertificateNotFound
    - lastProbeTime: '2020-11-09T11:40:26Z'
      lastTransitionTime: '2020-11-09T11:40:26Z'
      message: Successfully launched install pod
      reason: InstallLaunchSuccessful
      status: 'False'
      type: InstallLaunchError
    - lastProbeTime: '2020-11-09T12:17:44Z'
      lastTransitionTime: '2020-11-09T12:17:44Z'
      message: One of the SyncSet applies has failed
      reason: SyncSetApplyFailure
      status: 'True'
      type: SyncSetFailed
    - lastProbeTime: '2020-11-09T12:16:44Z'
      lastTransitionTime: '2020-11-09T12:16:44Z'
      message: >-
        One or more serving certificates for the cluster control plane are
        missing
      reason: ControlPlaneCertificatesNotFound
      status: 'True'
      type: ControlPlaneCertificateNotFound
    - lastProbeTime: '2020-11-09T12:16:45Z'
      lastTransitionTime: '2020-11-09T12:16:45Z'
      message: cluster is reachable
      reason: ClusterReachable
      status: 'False'
      type: Unreachable
  installedTimestamp: '2020-11-09T12:16:44Z'
  installerImage: >-
    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6d71a2c911976c33255b57277d62a8dee2972c302315a87d1f9ab64884cbbb1b
  provisionRef:
    name: openshift-binero-0-vprlh
  webConsoleURL: 'https://console-openshift-console.<domain>'

Certman Operator not working

I have an Openshift 4.3.12 cluster, I deployed Hive on it and created an Openshift 4.3.2 cluster on Azure using Hive. My ClusterDeployment Instance has set installed to true. I deployed Certman Operator using the manifest given in deploy folder, without any tag so latest will have been deployed but Certman is not creating any CertificateRequest. The logs are

{"level":"info","ts":1589193396.991396,"logger":"userMetrics","msg":"Metrics Route object updated Route.Name certman-operator and Route.Namespace certman-operator"}
{"level":"info","ts":1589193396.9914522,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1589193431.2917986,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"clusterdeployment-controller"}
{"level":"info","ts":1589193431.2921095,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"certificaterequest-controller"}
{"level":"info","ts":1589193431.3921175,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"clusterdeployment-controller","worker count":1}
{"level":"info","ts":1589193431.3922951,"logger":"controller_clusterdeployment","msg":"reconciling ClusterDeployment","Request.Namespace":"cluster-azure","Request.Name":"openshift-cluster-azure"}
{"level":"info","ts":1589193431.392353,"logger":"controller_clusterdeployment","msg":"not a managed cluster","Request.Namespace":"cluster-azure","Request.Name":"openshift-cluster-azure"}
{"level":"info","ts":1589193431.3924532,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"certificaterequest-controller","worker count":1}

Remove the "staging" field from CRD

Let's Encrypt staging/production endpoint should be passed to operator as configuration parameters. There is no real case where we would request certificates from Let's Encrypt staging API endpoint in production environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.