Git Product home page Git Product logo

Comments (14)

jdhirst avatar jdhirst commented on August 15, 2024 3

For those testing OKD and want to bypass this bug to get UPI to deploy, it is sufficient to destroy the old etcd-signer container using podman rm -f etcd-signer and redeploy manually using the method from bootkube.sh.

Here is my example: (replace KUBE_ETCD_SIGNER_SERVER_IMAGE with the same image used in your build; obtain from podman ps)

KUBE_ETCD_SIGNER_SERVER_IMAGE=registry.svc.ci.openshift.org/origin/4.4-2020-02-04-174205@sha256:8755e700accb5b6d92fd7d2c7b7a6252ed62f843f06fc31812b415a0ac47e0e1
podman pull ${KUBE_ETCD_SIGNER_SERVER_IMAGE}
podman run --quiet --net=host \
	--name etcd-signer \
	--detach \
	--volume /opt/openshift/tls:/opt/openshift/tls:ro,z \
	"${KUBE_ETCD_SIGNER_SERVER_IMAGE}" \
	serve \
	--cacrt=/opt/openshift/tls/etcd-signer.crt \
	--cakey=/opt/openshift/tls/etcd-signer.key \
	--metric-cacrt=/opt/openshift/tls/etcd-metric-signer.crt \
	--metric-cakey=/opt/openshift/tls/etcd-metric-signer.key \
	--servcrt=/opt/openshift/tls/kube-apiserver-lb-server.crt \
	--servkey=/opt/openshift/tls/kube-apiserver-lb-server.key \
	--servcrt=/opt/openshift/tls/kube-apiserver-internal-lb-server.crt \
	--servkey=/opt/openshift/tls/kube-apiserver-internal-lb-server.key \
	--servcrt=/opt/openshift/tls/kube-apiserver-localhost-server.crt \
	--servkey=/opt/openshift/tls/kube-apiserver-localhost-server.key \
	--address=0.0.0.0:6443 \
	--insecure-health-check-address=0.0.0.0:6080 \
	--csrdir=/tmp \
	--peercertdur=26280h \
	--servercertdur=26280h \
	--metriccertdur=26280h

This allowed me to bootstrap without issue and was able to get OKD deployed to test other components.

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024 3

Current workaround: restart the pod or simply reboot the bootstrap node

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024 1

cri-o/cri-o#3227 is the fix for crio 1.17

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024 1

crio 1.17.0 landed a few days ago, etcd-signer no longers gets stuck in my tests

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024

Seems to be intermittent, seen this in ~50% of all runs on CI (independent of the platform)

It appears that etcd-signer container is shown as running but is stuck and not approving csrs

from okd.

jdhirst avatar jdhirst commented on August 15, 2024

I'm not sure if its intermittent as far as the individual release goes because I have rebuild my bootstrap node around 10 times so far on 4.4.0-0.okd-2020-01-29-103659, and the issue occurs every time.

However, when I use build 4.4.0-0.okd-2020-01-28-022517, bootstrap node works fine, but the masters have this happen around 50% of the time. (the master issue seems to be intermittent).

UPDATE: Confirmed that this is the case with 4.4.0-0.okd-2020-01-29-161855 as well, so it seems to be something added starting at 4.4.0-0.okd-2020-01-29-103659.

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024

Does it fail every time on mirrored images? I've noticed CI's registry has become very unreliable recently, so it may have been caused by hidden pull errors. Could you give it a try?

openshift/installer#3013 would help eliminating those

from okd.

jdhirst avatar jdhirst commented on August 15, 2024

Yes, I have been using mirrored images and appending the imageContentSources key to the end of my install-config.yaml file. This occurs with both direct to CI installs and mirrored installs.

from okd.

jdhirst avatar jdhirst commented on August 15, 2024

Just tested 4.4.0-0.okd-2020-02-03-164031, same issue occurs, bootstrap etcd signer fails:

2020-02-03T18:21:32.510878803+00:00 stderr F Error: error requesting certificate: error obtaining signed certificate from signer: timed out waiting for the condition
2020-02-03T18:21:32.512207454+00:00 stderr F Usage:
2020-02-03T18:21:32.512207454+00:00 stderr F   kube-client-agent request --FLAGS [flags]
2020-02-03T18:21:32.512207454+00:00 stderr F 
2020-02-03T18:21:32.512207454+00:00 stderr F Flags:
2020-02-03T18:21:32.512207454+00:00 stderr F       --assetsdir string    Directory location for the agent where it stores signed certs
2020-02-03T18:21:32.512207454+00:00 stderr F       --commonname string   Common name for the certificate being requested
2020-02-03T18:21:32.512207454+00:00 stderr F       --dnsnames string     Comma separated DNS names of the node to be provided for the X509 certificate
2020-02-03T18:21:32.512207454+00:00 stderr F   -h, --help                help for request
2020-02-03T18:21:32.512207454+00:00 stderr F       --ipaddrs string      Comma separated IP addresses of the node to be provided for the X509 certificate
2020-02-03T18:21:32.512207454+00:00 stderr F       --kubeconfig string   Path to the kubeconfig file to connect to apiserver. If "", InClusterConfig is used which uses the service account kubernetes gives to pods.
2020-02-03T18:21:32.512207454+00:00 stderr F       --orgname string      CA private key file for signer
2020-02-03T18:21:32.512207454+00:00 stderr F 
2020-02-03T18:21:32.512207454+00:00 stderr F ERROR: logging before flag.Parse: F0203 18:21:32.512137       7 main.go:18] Error executing kube-client-agent: error requesting certificate: error obtaining signed certificate from signer: timed out waiting for the condition

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024

I'm pretty sure crun/ conmon/ podman from Fedora are unstable (CI gets stuck on this fairly often too). The symptoms are:

  • podman inspect etcd-signer says no such container
  • podman logs etcd-signer exits immediately. Usually container comes up correctly on next reboot

@openshift/sig-containers ^ any ideas how to debug this?

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024

Pretty sure its a podman bug - containers/podman#5109

from okd.

vrutkovs avatar vrutkovs commented on August 15, 2024

CRIO and podman are not playing well together - crio finds podman containers, checks it DB - and since these are not there it removes them. Sometimes it happens when etcd-signer is still running, so signer's storage is removed and it hangs.

This is a crio bug, I'll attempt to work it around until the fix lands in the machine-os-content

from okd.

sgreene570 avatar sgreene570 commented on August 15, 2024

thought I was hitting this but then i remembered that the initial etcd cert expires 24hrs after creating ignition configs. 🤦‍♂️
https://bugzilla.redhat.com/show_bug.cgi?id=1726995

from okd.

Rouf111 avatar Rouf111 commented on August 15, 2024

Hello,

I am experiencing the same issue when trying to deploy the OKD 4.12 on User Provisioned Baremetal.
do you have any suggestions for the below error message
Jan 29 21:12:34 linux etcdctl[24703]: {"level":"warn","ts":"2023-01-29T21:12:34.131Z","logger":"client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00021c000/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp [::1]:2379: connect: connection refused\""} Jan 29 21:12:34 linux etcdctl[24703]: https://localhost:2379 is unhealthy: failed to commit proposal: context deadline exceeded

from okd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.