Comments (14)
Generic health check would be a great idea, @phoban01
I'll have another issue to track it as a follow up of this one!
from tf-controller.
Could you provide a little more details on what you are envisioning?
from tf-controller.
Maybe we can focus only at network-based health checks first. Something like this.
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
spec:
path: ./terraform
healthChecks:
- name: rds
kind: TCP
address: ${output.rds_ip_address}:3306
timeout: 15m
- name: nginx
kind: HTTP
address: https://${output.nginx_ip_address}/ping
timeout: 15m
from tf-controller.
- name: nginx
type: httpPost
url: https://${output.nginx_ip_address}/post
timeout: 15m
We should not include the HTTP Post method there as a POST normally requires a body content. Please goes with HTTP and assume that it's always the GET method to make thing simple.
In which state/conditions the health checks should be performed? After apply?
Yes, after Apply succeeded.
- The condition state of HealthCheck is recorded. HealthCheck = unknown. The reason is "HealthChecking".
Should the health check result change Ready state and any conditions? Add a new healthCheck condition?
Yes to both.
- The Ready state become Ready = true, and the reason changed from
AppliedSucceed
toHealthCheckSucceed
. - The condition state of HealthCheck is recorded. HealthCheck = true. The reason can be the same value as above.
Does it perform a health check on every loop?
- No, we could add an if to say that it's checking only after the Apply step is perfomed.
When the health check succeed, what should it do in the next loop?
- When health check is done, please return the Terraform object with a nil error. This will pass thru the reconcile logic and it's going normally with
Result{RetryAfter: the interval period}
.
When the health check fails, what should it do in the next loop?
- When checking fails, we'll do it another round with
Result{RetryAfter: health check interval }
with an error
What if the user makes a change to the terraform file when the health check is in a failed state?
- The current logic should make it thru the normal loop. We could add an if condition to fix that things go wrong.
from tf-controller.
Might want to consider adding a successThreshold
and failureThreshold
to avoid transient network errors triggering health-check failures?
Generally, I think the application of tcp/http healthchecks could be limited, as it might often be the case that there's no network connectivity between the resources being created and the tf-controller. If outputs could be passed to exec then something like the following would be more practical:
healthChecks:
- name: bucket
exec: "aws s3api head-bucket --bucket {{ outputs.bucket_name }}"
from tf-controller.
Of course, please use what you see fit.
Mine was just an example.
from tf-controller.
Do you mean by flagger's metrics analysis?
from tf-controller.
This part:
https://docs.flagger.app/usage/webhooks
from tf-controller.
Thanks for the details. I implemented the basics of the health checks in the draft PR above. I renamed some specs for clarity:
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
spec:
path: ./terraform
healthChecks:
- name: rds
type: tcp
url: ${output.rds_ip_address}:3306
timeout: 15m
- name: nginx
type: httpGet
url: https://${output.nginx_ip_address}/ping
timeout: 15m
- name: nginx
type: httpPost
url: https://${output.nginx_ip_address}/post
timeout: 15m
The next step is incorporating the functions into the reconcile loop. I would like to discuss the overall flow regarding health check with you all.
- In which state/conditions the health checks should be performed? After apply?
- Should the health check result change Ready state and any conditions? Add a new healthCheck condition?
- Does it perform a health check on every loop?
- When the health check succeed, what should it do in the next loop?
- When the health check fails, what should it do in the next loop?
- What if the user makes a change to the terraform file when the health check is in a failed state?
Once we have a consensus, I will diagram the flow for clarity.
from tf-controller.
When checking fails, we'll do it another round with Result{RetryAfter: health check interval } with an error
Does that mean we need an another RetryInterval
param for each health check?
from tf-controller.
Interesting. Could we?
from tf-controller.
To get the output url, can I use the go template format {{ outputs.url }}
instead of terraform's format?
from tf-controller.
Are we good to close this?
from tf-controller.
Fixed by #53
Thank you @tomhuang12 !!
from tf-controller.
Related Issues (20)
- Finalizer remains despite dependency was deleted HOT 4
- Upgrade Terraform version to v1.6.x for tf-runner HOT 4
- sourceRef using labels HOT 1
- Detect what was applied historically HOT 6
- How to set TF_CLI_ARGS_init
- Tf-runner falls into CrashLoopBackOff state because of unknown flag --grpc-port for tofu-controller command HOT 2
- OSSF scoreboard workflow fails
- Getting `404 Not Found` with terraform HOT 5
- Terraform does not send recovery alert
- Exponentially backoff on reconciliation failure HOT 1
- Autoplan modules
- Branch planner PR comment missing metadata
- Implement RemediateLastFailure
- Support lock=false for terraforms in drift-detection-only mode
- tf-runner base images HOT 1
- LockTimeout is not available in drift-detection mode
- New Terraform objects in drift-detection-only mode hang in Initializing status
- Unable to Increase Log Level for Runners
- Controller doesn't fully clean up removed resources HOT 1
- Customise default volumes for runner pod.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tf-controller.