Git Product home page Git Product logo

Comments (14)

chanwit avatar chanwit commented on July 30, 2024 2

Generic health check would be a great idea, @phoban01
I'll have another issue to track it as a follow up of this one!

from tf-controller.

tomhuang12 avatar tomhuang12 commented on July 30, 2024 1

Could you provide a little more details on what you are envisioning?

from tf-controller.

chanwit avatar chanwit commented on July 30, 2024 1

Maybe we can focus only at network-based health checks first. Something like this.

apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
spec:
  path: ./terraform
  healthChecks:
  - name: rds
    kind: TCP
    address: ${output.rds_ip_address}:3306
    timeout: 15m
  - name: nginx
    kind: HTTP
    address: https://${output.nginx_ip_address}/ping
    timeout: 15m

from tf-controller.

chanwit avatar chanwit commented on July 30, 2024 1
  - name: nginx
    type: httpPost
    url: https://${output.nginx_ip_address}/post
    timeout: 15m

We should not include the HTTP Post method there as a POST normally requires a body content. Please goes with HTTP and assume that it's always the GET method to make thing simple.

In which state/conditions the health checks should be performed? After apply?

Yes, after Apply succeeded.

  • The condition state of HealthCheck is recorded. HealthCheck = unknown. The reason is "HealthChecking".

Should the health check result change Ready state and any conditions? Add a new healthCheck condition?

Yes to both.

  • The Ready state become Ready = true, and the reason changed from AppliedSucceed to HealthCheckSucceed.
  • The condition state of HealthCheck is recorded. HealthCheck = true. The reason can be the same value as above.

Does it perform a health check on every loop?

  • No, we could add an if to say that it's checking only after the Apply step is perfomed.

When the health check succeed, what should it do in the next loop?

  • When health check is done, please return the Terraform object with a nil error. This will pass thru the reconcile logic and it's going normally with Result{RetryAfter: the interval period}.

When the health check fails, what should it do in the next loop?

  • When checking fails, we'll do it another round with Result{RetryAfter: health check interval } with an error

What if the user makes a change to the terraform file when the health check is in a failed state?

  • The current logic should make it thru the normal loop. We could add an if condition to fix that things go wrong.

from tf-controller.

phoban01 avatar phoban01 commented on July 30, 2024 1

Might want to consider adding a successThreshold and failureThreshold to avoid transient network errors triggering health-check failures?

Generally, I think the application of tcp/http healthchecks could be limited, as it might often be the case that there's no network connectivity between the resources being created and the tf-controller. If outputs could be passed to exec then something like the following would be more practical:

healthChecks:
- name: bucket
  exec: "aws s3api head-bucket --bucket {{ outputs.bucket_name }}"

from tf-controller.

chanwit avatar chanwit commented on July 30, 2024 1

Of course, please use what you see fit.

Mine was just an example.

from tf-controller.

tomhuang12 avatar tomhuang12 commented on July 30, 2024

Do you mean by flagger's metrics analysis?

from tf-controller.

chanwit avatar chanwit commented on July 30, 2024

This part:
https://docs.flagger.app/usage/webhooks

from tf-controller.

tomhuang12 avatar tomhuang12 commented on July 30, 2024

Thanks for the details. I implemented the basics of the health checks in the draft PR above. I renamed some specs for clarity:

apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
spec:
  path: ./terraform
  healthChecks:
  - name: rds
    type: tcp
    url: ${output.rds_ip_address}:3306
    timeout: 15m
  - name: nginx
    type: httpGet
    url: https://${output.nginx_ip_address}/ping
    timeout: 15m
  - name: nginx
    type: httpPost
    url: https://${output.nginx_ip_address}/post
    timeout: 15m

The next step is incorporating the functions into the reconcile loop. I would like to discuss the overall flow regarding health check with you all.

  • In which state/conditions the health checks should be performed? After apply?
  • Should the health check result change Ready state and any conditions? Add a new healthCheck condition?
  • Does it perform a health check on every loop?
  • When the health check succeed, what should it do in the next loop?
  • When the health check fails, what should it do in the next loop?
  • What if the user makes a change to the terraform file when the health check is in a failed state?

Once we have a consensus, I will diagram the flow for clarity.

from tf-controller.

tomhuang12 avatar tomhuang12 commented on July 30, 2024

When checking fails, we'll do it another round with Result{RetryAfter: health check interval } with an error

Does that mean we need an another RetryInterval param for each health check?

from tf-controller.

chanwit avatar chanwit commented on July 30, 2024

Interesting. Could we?

from tf-controller.

tomhuang12 avatar tomhuang12 commented on July 30, 2024

To get the output url, can I use the go template format {{ outputs.url }} instead of terraform's format?

from tf-controller.

tomhuang12 avatar tomhuang12 commented on July 30, 2024

Are we good to close this?

from tf-controller.

chanwit avatar chanwit commented on July 30, 2024

Fixed by #53
Thank you @tomhuang12 !!

from tf-controller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.