Comments (24)
error
I0529 21:32:03.564574 9867 kubectl.go:593] copying /workspace/kubernetes/platforms/linux/amd64/kubectl to the httpd pod
I0529 21:32:03.564635 9867 builder.go:121] Running '/workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://34.83.180.141 --kubeconfig=/workspace/.kube/config --namespace=kubectl-6332 cp /workspace/kubernetes/platforms/linux/amd64/kubectl kubectl-6332/httpd:/tmp/'
I0529 21:32:09.340487 9867 builder.go:135] rc: 1
I0529 21:32:09.340623 9867 builder.go:91] Unexpected error:
<exec.CodeExitError>:
error running /workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://34.83.180.141 --kubeconfig=/workspace/.kube/config --namespace=kubectl-6332 cp /workspace/kubernetes/platforms/linux/amd64/kubectl kubectl-6332/httpd:/tmp/:
Command stdout:
stderr:
E0529 21:32:09.334805 17304 v2.go:167] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
E0529 21:32:09.334808 17304 v2.go:129] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
E0529 21:32:09.334853 17304 v2.go:150] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
error: error reading from error stream: next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer
error:
exit status 1
{
Err: <*errors.errorString | 0xc0046ce9b0>{
s: "error running /workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://34.83.180.141 --kubeconfig=/workspace/.kube/config --namespace=kubectl-6332 cp /workspace/kubernetes/platforms/linux/amd64/kubectl kubectl-6332/httpd:/tmp/:\nCommand stdout:\n\nstderr:\nE0529 21:32:09.334805 17304 v2.go:167] \"Unhandled Error\" err=\"next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer\"\nE0529 21:32:09.334808 17304 v2.go:129] \"Unhandled Error\" err=\"next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer\"\nE0529 21:32:09.334853 17304 v2.go:150] \"Unhandled Error\" err=\"next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer\"\nerror: error reading from error stream: next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer\n\nerror:\nexit status 1",
},
Code: 1,
}
[FAILED] error running /workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://34.83.180.141 --kubeconfig=/workspace/.kube/config --namespace=kubectl-6332 cp /workspace/kubernetes/platforms/linux/amd64/kubectl kubectl-6332/httpd:/tmp/:
Command stdout:
stderr:
E0529 21:32:09.334805 17304 v2.go:167] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
E0529 21:32:09.334808 17304 v2.go:129] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
E0529 21:32:09.334853 17304 v2.go:150] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
error: error reading from error stream: next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer
error:
exit status 1
kube-apiserver logs confirms the problem but seems to happen on the worker node https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce-containerd/1795925976450863104/artifacts/bootstrap-e2e-master/
E0529 21:32:09.315375 10 conn.go:339] Error on socket receive: read tcp 10.40.0.2:443->34.121.237.185:50460: use of closed network connection
I0529 21:32:09.317154 10 httplog.go:132] "HTTP" verb="CONNECT" URI="/api/v1/namespaces/kubectl-6332/pods/httpd/exec?command=tar&command=-xmf&command=-&command=-C&command=%2Ftmp&container=httpd&stderr=true&stdin=true&stdout=true" latency="5.045622494s" userAgent="kubectl/v1.31.0 (linux/amd64) kubernetes/e821e4f" audit-ID="030c7a1d-0602-402b-932f-397799adf61f" srcIP="34.121.237.185:50460" hijacked=trueI0529 21:32:09.317791 10 conn.go:134] closing connection dialID 6172859311107220987 connectionID 742
the job logs indicate that the pod probes also fails
STEP: Found 6 events. - k8s.io/kubernetes/test/e2e/framework/debug/dump.go:46 @ 05/29/24 21:32:10.323
I0529 21:32:10.323658 9867 dump.go:53] At 2024-05-29 21:31:49 +0000 UTC - event for httpd: {default-scheduler } Scheduled: Successfully assigned kubectl-6332/httpd to bootstrap-e2e-minion-group-3h2k
I0529 21:32:10.323693 9867 dump.go:53] At 2024-05-29 21:31:52 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Pulled: Container image "registry.k8s.io/e2e-test-images/httpd:2.4.38-4" already present on machine
I0529 21:32:10.323701 9867 dump.go:53] At 2024-05-29 21:31:52 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Created: Created container httpd
I0529 21:32:10.323709 9867 dump.go:53] At 2024-05-29 21:31:54 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Started: Started container httpd
I0529 21:32:10.323716 9867 dump.go:53] At 2024-05-29 21:32:09 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Killing: Stopping container httpd
I0529 21:32:10.323722 9867 dump.go:53] At 2024-05-29 21:32:09 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Unhealthy: Readiness probe failed: Get "http://10.64.4.226:80/": dial tcp 10.64.4.226:80: connect: connection refused
kubelet logs also show the problem seems to be within the node
httplog.go:132] "HTTP" verb="POST" URI="/exec/nettest-1794/test-container-pod/webserver?command=%2Fbin%2Fsh&command=-c&command=curl+-g+-q+-s+%27http%3A%2F%2F10.64.4.146%3A8083%2Fdial%3Frequest%3Dhostname%26protocol%3Dhttp%26host%3D10.0.192.209%26port%3D80%26tries%3D1%27&error=1&output=1" latency="432.25506ms" userAgent="e2e.test/v1.31.0 (linux/amd64) kubernetes/e821e4f -- [sig-network] Networking Granular Checks: Services should function for multiple endpoint-Services with same selector" audit-ID="" srcIP="10.64.4.3:44658" hijacked=true
May 29 21:32:09.715792 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.714632 8827 prober.go:155] "HTTP-Probe" scheme="http" host="10.64.4.226" port="80" path="/" timeout="5s" headers=null
May 29 21:32:09.715792 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.715160 8827 prober.go:107] "Probe failed" probeType="Readiness" pod="kubectl-6332/httpd" podUID="7d35af69-1f80-438b-9ce6-a657e3fc9769" containerName="httpd" probeResult="failure" output="Get \"http://10.64.4.226:80/\": dial tcp 10.64.4.226:80: connect: connection refused"
May 29 21:32:09.715792 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.715496 8827 event.go:389] "Event occurred" object="kubectl-6332/httpd" fieldPath="spec.containers{httpd}" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Readiness probe failed: Get \"http://10.64.4.226:80/\": dial tcp 10.64.4.226:80: connect: connection refused"
can't say what happened but everything points the node at that time has some problems
from kubernetes.
i don't think kube-up uses the GCP CCM, it's just some bash that provisions GCE nodes.
looks like folks on this thread are already debugging, but f you know someone from GCP, please cc them.
also FTR, there is an effort to replace kube-up jobs with kOps jobs:
kubernetes/enhancements#4224
from kubernetes.
i don't think kube-up uses the GCP CCM, it's just some bash that provisions GCE nodes.
@neolit123 it does https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce-containerd/1795925976450863104/artifacts/bootstrap-e2e-master/cloud-controller-manager.log , kube-up.sh were moved to external ccm with the deprecation
from kubernetes.
let's reopen if needed.
/close
from kubernetes.
/remove-sig cli
from kubernetes.
/sig cloud-provider
/area provider/gcp
from kubernetes.
/sig node
cc @SergeyKanzhelev @mrunalp
from kubernetes.
I think the readiness probe failures might be WAI:
After the test fails, we are deleting the pods forcefully:
I0529 21:32:09.341287 9867 builder.go:121] Running '/workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://34.83.180.141 --kubeconfig=/workspace/.kube/config --namespace=kubectl-6332 delete --grace-period=0 --force -f -'
I0529 21:32:09.682128 9867 builder.go:146] stderr: "Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.\n"
I0529 21:32:09.682156 9867 builder.go:147] stdout: "pod \"httpd\" force deleted\n"
and I see readiness probe failures after that:
May 29 21:32:09.715792 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.715160 8827 prober.go:107] "Probe failed" probeType="Readiness" pod="kubectl-6332/httpd" podUID="7d35af69-1f80-438b-9ce6-a657e3fc9769" containerName="httpd" probeResult="failure" output="Get \"http://10.64.4.226:80/\": dial tcp 10.64.4.226:80: connect: connection refused"
May 29 21:32:09.715792 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.715496 8827 event.go:389] "Event occurred" object="kubectl-6332/httpd" fieldPath="spec.containers{httpd}" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Readiness probe failed: Get \"http://10.64.4.226:80/\": dial tcp 10.64.4.226:80: connect: connection refused"
from kubernetes.
@AnishShah good catch, so forget about the probes, we need to understand why the connection is reset when trying to create the tunnel to the pod
from kubernetes.
@neolit123 , we are reviewing this bug in the sig cloud-provider meeting today, but it's not clear what the cloud-provider (or ccm) specific issue is. is there something we are missing?
from kubernetes.
@elmiko the job uses kube-up, which still lives in k/k and is owned by the GCP provider:
https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/OWNERS#L26-L27
kube-up used to be owned by SIG CL but was "donated" to SIG CP / GCP provider.
the job itself is also owned by SIG CP ATM, see:
https://github.com/kubernetes/test-infra/blob/6725736ff16b7bc7cb16af9c69040537618166a1/config/jobs/kubernetes/sig-cloud-provider/gcp/gcp-gce.yaml#L801
that said the cause for the failure could be on the kubelet and on SIG node, however i doubt that.
from kubernetes.
@neolit123 ack, thank you for the extra context. do you think this is an issue with the gcp ccm or more the kube-up script?
i'm trying to understand how we can help from the sig, or if there is something we can do. i personally don't have deep experience with the gcp test suite, i'm wondering if we need someone from the gcp cloud provider maintainers to investigate further?
from kubernetes.
there is another type of flake for this job which is:
e2e.go: diffResources expand_less 0s
{ Error: 2 leaked resources
+NAME MACHINE_TYPE PREEMPTIBLE CREATION_TIMESTAMP
+bootstrap-e2e-minion-template e2-standard-2 2024-06-04T16:56:53.392-07:00}
that's GCE specific.
from kubernetes.
looks like folks on this thread are already debugging, but f you know someone from GCP, please cc them.
cc @cheftako @andrewsykim , if you have a chance, would love your thoughts
from kubernetes.
Notes from sig-node CI meeting:
- This particular subtest is green in the testgrid. So we can consider this as not blocking for 1.31 alpha release.
from kubernetes.
we need to understand why the connection is reset when trying to create the tunnel to the pod
I'm trying to check konnectivity-agent logs. But I cannot find logs of konnectivity-agent but the pods seem to be in Ready state when the subtest failed.
from kubernetes.
one additional data point, Same test(s) are working fine the ec2 variant of the CI job:
https://testgrid.k8s.io/amazon-ec2#ec2-ubuntu-master-containerd&width=20&include-filter-by-regex=Simple%20pod%20should%20return%20command%20exit
from kubernetes.
Hey Folks,
This test is still flaking, recent failures:
5/30/2024, 2:41:34 AM ci-kubernetes-e2e-ubuntu-gce-containerd
5/26/2024, 7:42:14 AM ci-kubernetes-e2e-gci-gce-network-proxy-grpc
5/22/2024, 8:30:21 PM ci-kubernetes-e2e-gci-gce-alpha-enabled-default
from kubernetes.
we need to understand why the connection is reset when trying to create the tunnel to the pod
I'm trying to check konnectivity-agent logs. But I cannot find logs of konnectivity-agent but the pods seem to be in Ready state when the subtest failed.
the konnectiviy agent is indeed the @AnishShah one of the most possible causes, specially since dims says is not failing on the AWS jobs that AFAIK don't use it.
@AnishShah you can get the agent server logs in the master vm https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce-containerd/1795925976450863104/artifacts/bootstrap-e2e-master/konnectivity-server.log , we need to check the log dump script to get the agent too.
from kubernetes.
hmm, checking https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce-containerd/1795925976450863104
failure is
stderr:
E0529 21:32:09.334805 17304 v2.go:167] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
E0529 21:32:09.334808 17304 v2.go:129] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
E0529 21:32:09.334853 17304 v2.go:150] "Unhandled Error" err="next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer"
error: error reading from error stream: next reader: read tcp 10.34.160.50:50460->34.83.180.141:443: read: connection reset by peer
error:
exit status 1
In [It] at: k8s.io/kubernetes/test/e2e/framework/kubectl/builder.go:91 @ 05/29/24 21:32:09.34
pod events
I0529 21:32:10.323709 9867 dump.go:53] At 2024-05-29 21:31:54 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Started: Started container httpd
I0529 21:32:10.323716 9867 dump.go:53] At 2024-05-29 21:32:09 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Killing: Stopping container httpd
I0529 21:32:10.323722 9867 dump.go:53] At 2024-05-29 21:32:09 +0000 UTC - event for httpd: {kubelet bootstrap-e2e-minion-group-3h2k} Unhealthy: Readiness probe failed: Get "http://10.64.4.226:80/": dial tcp 10.64.4.226:80: connect: connection refused
May 29 21:32:09.554745 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.554597 8827 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=["kubectl-6332/httpd"]
May 29 21:32:09.554745 bootstrap-e2e-minion-group-3h2k kubelet[8827]: I0529 21:32:09.554645 8827 pod_workers.go:854] "Pod is marked for graceful deletion, begin teardown" pod="kubectl-6332/httpd" podUID="7d35af69-1f80-438b-9ce6-a657e3fc9769" updateType="update"
something is deleting the pod under test, that is why the test fails, cc : @bobbypage @SergeyKanzhelev
from kubernetes.
/assign @cheftako
/triage accepted
from kubernetes.
xref: #126192
from kubernetes.
Hey folks! The release cycle for 1.32 starts today. Since this is still open, I will carry it over to the latest milestone.
It looks like this test is still flaking:
from kubernetes.
@dims: Closing this issue.
In response to this:
let's reopen if needed.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
from kubernetes.
Related Issues (20)
- Cronjob controller doesn't honor defined schedules when hour is defined HOT 4
- [Flaky test] Test_Run_Positive_VolumeMountControllerAttachEnabledRace fails occasionally HOT 1
- kubeadm config images list does not provide the correct version of the images HOT 16
- Further increase the default node-monitor-grace-period HOT 6
- Possible dereferencing a nil pointer in pkg/registry/generic/registry/store.go HOT 21
- [InPlacePodVerticalScaling]Got RunContainerError when patch pod image and resources HOT 9
- test-e2e-node prerequisites check are incorrectly calculated HOT 2
- feat: supporting KUBE_VERBOSE on test-integration & test-e2e-node HOT 3
- Endpoints do not reconcile with EndpointSlices for Services with selector HOT 15
- make the scheduler can react when the runtimeclass is changed HOT 3
- Failure cluster [acfaf590...] `[sig-node] Restart [Serial] [Slow] [Disruptive] Kubelet should evict running pods that do not meet the affinity after the kubelet restart` HOT 16
- Separate log and container lifecycle management HOT 3
- Missing information HOT 2
- One node(s) had taints that the pod didn't tolerate in kubernetes cluster HOT 26
- Scheduler perf incorrectly shows percentiles for fast metrics HOT 3
- DRA API: consumable capacity in v1beta1. HOT 10
- /proxy/metics/cadvisor metrics data not updating metrics frequently HOT 2
- Missing details in nodeAffinity's API specification HOT 1
- Is event.InvolvedObject fields is required in kubernetes? HOT 8
- scheduler: more fine-grained QHints HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.